You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/25 20:28:42 UTC

[GitHub] [arrow-rs] alamb opened a new issue, #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

alamb opened a new issue, #1620:
URL: https://github.com/apache/arrow-rs/issues/1620

   Good idea checking the C++ version. The docs for binary layout also mention
   
   > The offsets buffer contains length + 1 signed integers
   > ...
   > and the last slot is the length of the values array
   
   which would mean there has to be a single zero in the offsets buffer for an empty `ListArray`.
   
   If this is a requirements it would be better to validate it when creating `ArrayData`. The code in [`ArrayData::validate_each_offset`](https://github.com/apache/arrow-rs/blob/b4642ecf9cb4a3d8b5741655fdc7d06b71bf41e6/arrow/src/array/data.rs#L1045) explicitly allow this case though:
   
   ```rust
           // An empty binary-like array can have 0 offsets
           if self.len == 0 && offset_buffer.is_empty() {
               return Ok(());
           }
   ```
   
   I'm more hesitant to change this now, maybe let's wait for some more eyes on this.
   
   _Originally posted by @jhorstmann in https://github.com/apache/arrow-rs/pull/1602#discussion_r854389543_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1120604872

   @alamb Could you please help to recheck that whether empty array has empty offsets buffer in arrow-testing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1120976700

   But I also find list array does not allow empty offsets:
   https://github.com/apache/arrow/blob/c70426f73326b3852d1bd7c31d98be4743f3fcba/cpp/src/arrow/array/array_nested.cc#L111-L113


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1120974176

   Does it mean that for empty (list array, binary array, and string array), both an empty offsets buffer and an offsets buffer with a single zero are allowed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1123722384

   I think the conclusion is "we should keep the current implementation, as imperfect as it may be, for backwards compatibility reasons"
   
   Agree with closing -- anyone who disagrees please let me know and I'll reopen it
   
   Thanks for the effort @HaoYang670 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb closed issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
alamb closed issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.
URL: https://github.com/apache/arrow-rs/issues/1620


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1121836317

   Personally, I prefer keeping the current implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1120381899

   I got this error:
   ```rust
   ---- ipc::reader::tests::read_generated_files_014 stdout ----
   ---- ipc::reader::tests::read_generated_files_014 stdout ----
   thread 'ipc::reader::tests::read_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
   
   ---- ipc::reader::tests::read_generated_streams_014 stdout ----
   thread 'ipc::reader::tests::read_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
   
   ---- ipc::writer::tests::read_and_rewrite_generated_files_014 stdout ----
   thread 'ipc::writer::tests::read_and_rewrite_generated_files_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
   
   ---- ipc::writer::tests::read_and_rewrite_generated_streams_014 stdout ----
   thread 'ipc::writer::tests::read_and_rewrite_generated_streams_014' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Offsets buffer size (bytes): 0 isn't large enough for Binary. Length 0 needs 1")', arrow/src/ipc/reader.rs:276:29
   ```
   after deleting
   https://github.com/apache/arrow-rs/blob/master/arrow/src/array/data.rs#L737-L740 
   and
   https://github.com/apache/arrow-rs/blob/master/arrow/src/array/data.rs#L1052-L1055
   
   I don't know how data is generated in arrow-testing. Do empty arrays have empty offsets buffer in the testing data?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1123263695

   Could we close this issue, or open it for more discussions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jorgecarleitao commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1121011247

   Hey, there is one file in the test files that has no offset - it is the file for version 0.14 or something. My understanding is that previous versions of the (IPC) spec did not require offset and recent versions require them.
   
   I have a comment about this [in arrow2](https://github.com/jorgecarleitao/arrow2/blob/main/src/io/ipc/read/array/binary.rs#L45).
   
   (I am curious about the mailing list discussion though; this is not very authoritative ^^)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1122214154

   Email link for cross reference: https://lists.apache.org/thread/w7g1zfqrjxx0bvrct0mt5zwxvdnc9nob


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1120937663

   Hi @HaoYang670. I looked at the C++ implementation and I think it does allow an empty offsets buffer for an empty array -- namely the validation code skips doing any checks if `data.len == 0`:
   
   https://github.com/apache/arrow/blob/c715bebbd89089f385c9996560866da23ea1ddda/cpp/src/arrow/array/validate.cc#L550
   
   > I don't know how data is generated in arrow-testing.
   
   I also don't know how this data was generated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1121001426

   > But it seems like list array does not allow empty offsets:
   
   I think that code is for constructing a new array programatically. The validation code I think may be more permissive to support arrays that come in from other arrow implementations. 
   
   Maybe it is time to ask on the dev@arrow.apache.org mailing list 🤔  I am out of ideas


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1121010961

   I will send an email


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

Posted by GitBox <gi...@apache.org>.
HaoYang670 commented on issue #1620:
URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1121811171

   It seems like Arrow tends to be ambiguous in this case, and an empty buffer or a buffer with just a 0 in it both are valid. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org