You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/28 18:10:11 UTC

[GitHub] [arrow-nanoarrow] lidavidm opened a new issue, #76: [C] Constructed zero-length arrays lack values buffer

lidavidm opened a new issue, #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76

   I don't have a minimal example yet, but it looks like if you never append to an array, it won't have a values buffer (poking at it in the debugger). I think this is invalid, and the Arrow Go library crashes when it tries to import this. 
   
   https://arrow.apache.org/docs/format/Columnar.html explicitly allows omitting the validity buffer, but this language is not carried into other sections, hence I think it's required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] alamb commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330998965

   In the Rust validation code, I remember having to special case an empty buffer (I think one of the example IPC files in the test-data repository has this), but can't remember the details. 
   
   cc @tustvold who perhaps remembers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1335478715

   I'm OK to close (unless there's a code change in nanoarrow that we'd like to see?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1331027209

   It sounds like we should update Columnar.rst with that as well, just to resolve the ambiguity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1329620057

   I've double-checked and the array seems to be valid, except that the data buffer is NULL because the buffer length is 0. 
   
   So the question remains: is it valid to have non-validity buffer pointers be NULL in the C Data Interface? The C++ library [handles this case explicitly](https://github.com/apache/arrow/blob/ad54d6ca3aed0e1022d1b135817d855659d482f9/cpp/src/arrow/c/bridge.cc#L1569-L1578), Java [does not appear to](https://github.com/apache/arrow/blob/master/java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java) (but maybe it's done implicitly - needs testing). Go crashes when it [constructs the slice](https://github.com/apache/arrow/blob/4afe71030cdd9d3103c7b028082ba63bafdf5d27/go/arrow/cdata/cdata.go#L617):
   
   <details>
   
   ```
           panic({0x9cd620, 0xea6060})
           	/nix/store/5xayx09nj7rxwkf1c2pxzdslp4ilrv5v-go-1.18.5/share/go/src/runtime/panic.go:838 +0x207
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importBuffer(0xc0000e8de0, 0x1, 0x0)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:615 +0x216
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importBitsBuffer(0xc0000b5a00?, 0x200000003?)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:621 +0x2c
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importFixedSizePrimitive(0xc0000e8de0)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:580 +0x1ed
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImport(0xc0000e8de0, 0xc0000ace01?)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:409 +0x11af
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importChild(0x0?, 0x0?, 0x110000000000?)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:303 +0x36
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImportChildren(0xc00024a360)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:351 +0x516
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImport(0xc00024a360, 0x1?)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:391 +0x125
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importChild(0xc0000ed680?, 0x0?, 0x555555555555?)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:303 +0x36
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImportChildren(0xc000099bc0)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:340 +0x719
           github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImport(0xc000099bc0, 0x2?)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:391 +0x125
           github.com/apache/arrow/go/v10/arrow/cdata.importCArrayAsType(...)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:653
           github.com/apache/arrow/go/v10/arrow/cdata.ImportCRecordBatchWithSchema(0x1aa0a70?, 0xc000099b60)
           	/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/interface.go:115 +0xb5
           github.com/apache/arrow/go/v10/arrow/cdata.(*nativeCRecordBatchReader).next(0xc00027c0c0)
   ```
   
   </details>
   
   @pitrou @zeroshade what do you think? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] lidavidm closed issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
lidavidm closed issue #76: [C] Constructed zero-length arrays lack values buffer
URL: https://github.com/apache/arrow-nanoarrow/issues/76


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330949584

   Perhaps the Rust folks have an opinion as well. @alippai @alamb @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1329592406

   Actually, I wonder if this might be considered an error in Go instead, since `malloc(0)` may return NULL and it would be a little weird to say that we're forced to allocate. It's also possible I'm doing something dumb and the length is nonzero...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] zeroshade commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
zeroshade commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330973207

   @lidavidm I believe the reason why the Go library assumes that the data buffer is not nil is because of the way I read the specification in the documentation [here](https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers) which said buffers were mandatory and only specifies that the validity buffer *may* be null if null count is 0. Which, admittedly, says nothing about whether or not the data buffer is allowed to be null. 
   
   > since `malloc(0)` may return NULL and it would be a little weird to say that we're forced to allocate.
   
   I agree with this and that I should fix this in the Go implementation. At a minimum, it shouldn't crash I'll make a GH issue for this. I agree with @pitrou we should clarify in the spec, and my vote is that the spec should specifically state it is allowed for the data buffer to be nil if the length is 0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1335394317

   Thanks Antoine! I think we can close this then. Or do we want to leave this open to track using a valid pointer in these cases?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1333904539

   The offsets buffer I believe is really a special case for compatibility with old buggy Arrow implementations?
   
   The main question here is whether zero-length buffers can be omitted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330971319

   There is also a discussion about the offsets buffer of empty things here: https://lists.apache.org/thread/w7g1zfqrjxx0bvrct0mt5zwxvdnc9nob ...I am pretty sure that I always allocate an offsets buffer (at least when creating arrays via `StartAppending()`) but based on Micah's note on that thread I made sure not to reference an offsets buffer if the length was zero.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] tustvold commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1331023238

   The conclusion we came to on https://github.com/apache/arrow-rs/issues/1620 is that an empty offsets buffer is permitted, and there is additional logic to handle it correctly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330949895

   Either way we need to clarify the spec.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1334179237

   The wording in the C Data Interface spec was clarified in https://github.com/apache/arrow/pull/14808


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer

Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1333914069

   > Actually, I wonder if this might be considered an error in Go instead, since `malloc(0)` may return NULL and it would be a little weird to say that we're forced to allocate.
   
   Note that doing arithmetic on null pointers is UB, so the use of null pointers is severely limited in practice. Forcing a valid pointer is good practice in general. We have explicit code for it Arrow C++:
   https://github.com/apache/arrow/blob/1d9f7781acf5e5b3679ac2d2aee9e837435ac87e/cpp/src/arrow/memory_pool.cc#L62


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org