You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/28 18:10:11 UTC
[GitHub] [arrow-nanoarrow] lidavidm opened a new issue, #76: [C] Constructed zero-length arrays lack values buffer
lidavidm opened a new issue, #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76
I don't have a minimal example yet, but it looks like if you never append to an array, it won't have a values buffer (poking at it in the debugger). I think this is invalid, and the Arrow Go library crashes when it tries to import this.
https://arrow.apache.org/docs/format/Columnar.html explicitly allows omitting the validity buffer, but this language is not carried into other sections, hence I think it's required.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] alamb commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330998965
In the Rust validation code, I remember having to special case an empty buffer (I think one of the example IPC files in the test-data repository has this), but can't remember the details.
cc @tustvold who perhaps remembers
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
paleolimbot commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1335478715
I'm OK to close (unless there's a code change in nanoarrow that we'd like to see?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1331027209
It sounds like we should update Columnar.rst with that as well, just to resolve the ambiguity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1329620057
I've double-checked and the array seems to be valid, except that the data buffer is NULL because the buffer length is 0.
So the question remains: is it valid to have non-validity buffer pointers be NULL in the C Data Interface? The C++ library [handles this case explicitly](https://github.com/apache/arrow/blob/ad54d6ca3aed0e1022d1b135817d855659d482f9/cpp/src/arrow/c/bridge.cc#L1569-L1578), Java [does not appear to](https://github.com/apache/arrow/blob/master/java/c/src/main/java/org/apache/arrow/c/BufferImportTypeVisitor.java) (but maybe it's done implicitly - needs testing). Go crashes when it [constructs the slice](https://github.com/apache/arrow/blob/4afe71030cdd9d3103c7b028082ba63bafdf5d27/go/arrow/cdata/cdata.go#L617):
<details>
```
panic({0x9cd620, 0xea6060})
/nix/store/5xayx09nj7rxwkf1c2pxzdslp4ilrv5v-go-1.18.5/share/go/src/runtime/panic.go:838 +0x207
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importBuffer(0xc0000e8de0, 0x1, 0x0)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:615 +0x216
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importBitsBuffer(0xc0000b5a00?, 0x200000003?)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:621 +0x2c
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importFixedSizePrimitive(0xc0000e8de0)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:580 +0x1ed
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImport(0xc0000e8de0, 0xc0000ace01?)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:409 +0x11af
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importChild(0x0?, 0x0?, 0x110000000000?)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:303 +0x36
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImportChildren(0xc00024a360)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:351 +0x516
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImport(0xc00024a360, 0x1?)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:391 +0x125
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).importChild(0xc0000ed680?, 0x0?, 0x555555555555?)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:303 +0x36
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImportChildren(0xc000099bc0)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:340 +0x719
github.com/apache/arrow/go/v10/arrow/cdata.(*cimporter).doImport(0xc000099bc0, 0x2?)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:391 +0x125
github.com/apache/arrow/go/v10/arrow/cdata.importCArrayAsType(...)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/cdata.go:653
github.com/apache/arrow/go/v10/arrow/cdata.ImportCRecordBatchWithSchema(0x1aa0a70?, 0xc000099b60)
/home/lidavidm/go/pkg/mod/github.com/apache/arrow/go/v10@v10.0.0-20220907151818-ff3aa3b7bb31/arrow/cdata/interface.go:115 +0xb5
github.com/apache/arrow/go/v10/arrow/cdata.(*nativeCRecordBatchReader).next(0xc00027c0c0)
```
</details>
@pitrou @zeroshade what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm closed issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
lidavidm closed issue #76: [C] Constructed zero-length arrays lack values buffer
URL: https://github.com/apache/arrow-nanoarrow/issues/76
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330949584
Perhaps the Rust folks have an opinion as well. @alippai @alamb @viirya
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1329592406
Actually, I wonder if this might be considered an error in Go instead, since `malloc(0)` may return NULL and it would be a little weird to say that we're forced to allocate. It's also possible I'm doing something dumb and the length is nonzero...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] zeroshade commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
zeroshade commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330973207
@lidavidm I believe the reason why the Go library assumes that the data buffer is not nil is because of the way I read the specification in the documentation [here](https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowArray.buffers) which said buffers were mandatory and only specifies that the validity buffer *may* be null if null count is 0. Which, admittedly, says nothing about whether or not the data buffer is allowed to be null.
> since `malloc(0)` may return NULL and it would be a little weird to say that we're forced to allocate.
I agree with this and that I should fix this in the Go implementation. At a minimum, it shouldn't crash I'll make a GH issue for this. I agree with @pitrou we should clarify in the spec, and my vote is that the spec should specifically state it is allowed for the data buffer to be nil if the length is 0.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] lidavidm commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
lidavidm commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1335394317
Thanks Antoine! I think we can close this then. Or do we want to leave this open to track using a valid pointer in these cases?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1333904539
The offsets buffer I believe is really a special case for compatibility with old buggy Arrow implementations?
The main question here is whether zero-length buffers can be omitted.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] paleolimbot commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
paleolimbot commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330971319
There is also a discussion about the offsets buffer of empty things here: https://lists.apache.org/thread/w7g1zfqrjxx0bvrct0mt5zwxvdnc9nob ...I am pretty sure that I always allocate an offsets buffer (at least when creating arrays via `StartAppending()`) but based on Micah's note on that thread I made sure not to reference an offsets buffer if the length was zero.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] tustvold commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1331023238
The conclusion we came to on https://github.com/apache/arrow-rs/issues/1620 is that an empty offsets buffer is permitted, and there is additional logic to handle it correctly
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1330949895
Either way we need to clarify the spec.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1334179237
The wording in the C Data Interface spec was clarified in https://github.com/apache/arrow/pull/14808
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-nanoarrow] pitrou commented on issue #76: [C] Constructed zero-length arrays lack values buffer
Posted by GitBox <gi...@apache.org>.
pitrou commented on issue #76:
URL: https://github.com/apache/arrow-nanoarrow/issues/76#issuecomment-1333914069
> Actually, I wonder if this might be considered an error in Go instead, since `malloc(0)` may return NULL and it would be a little weird to say that we're forced to allocate.
Note that doing arithmetic on null pointers is UB, so the use of null pointers is severely limited in practice. Forcing a valid pointer is good practice in general. We have explicit code for it Arrow C++:
https://github.com/apache/arrow/blob/1d9f7781acf5e5b3679ac2d2aee9e837435ac87e/cpp/src/arrow/memory_pool.cc#L62
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org