You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "snizovtsev (via GitHub)" <gi...@apache.org> on 2023/03/12 16:22:00 UTC
[GitHub] [arrow] snizovtsev opened a new issue, #34538: [Compute] Output preallocation for kernels producing structs
snizovtsev opened a new issue, #34538:
URL: https://github.com/apache/arrow/issues/34538
### Describe the enhancement requested
Hello,
On my project I faced the need to emit a pair of columns from my compute kernel. Since kernel's signatures allows only 1 output argument I wrapped multiple columns under Struct type. Then I realize that `MemAllocation::PREALLOCATE` doesn't allocate child buffers where struct's fields are stored:
```cpp
Result<std::shared_ptr<ArrayData>> KernelExecutorImpl::PrepareOutput(int64_t length) {
auto out = std::make_shared<ArrayData>(output_type_.GetSharedPtr(), length);
out->buffers.resize(output_num_buffers_);
...
for (size_t i = 0; i < data_preallocated_.size(); ++i) {
const auto& prealloc = data_preallocated_[i];
if (prealloc.bit_width >= 0) {
ARROW_ASSIGN_OR_RAISE(
out->buffers[i + 1],
AllocateDataBuffer(kernel_ctx_, length + prealloc.added_length,
prealloc.bit_width));
}
}
// out->child_data untouched
return out;
}
```
and then
```cpp
arrow::Status UdfExec(cp::KernelContext* ctx, const cp::ExecSpan& batch,
cp::ExecResult* out)
{
CHECK_EQ(out->array_data()->buffers.size(), 1); // validity bitmap allocated
CHECK_EQ(out->array_data()->child_data.size(), 0); // no data for struct fields allocated
return arrow::Status::OK();
}
```
As workaround have to set `kernel.mem_allocation = cp::MemAllocation::NO_PREALLOCATE;` and use regular builders in my kernel:
```cpp
arrow::Status UdfExec(cp::KernelContext* ctx, const cp::ExecSpan& batch,
cp::ExecResult* out)
{
std::unique_ptr<arrow::ArrayBuilder> builder;
RETURN_NOT_OK(arrow::MakeBuilder(ctx->memory_pool(),
out->array_data()->type,
&builder));
auto struct_ = dynamic_cast<arrow::StructBuilder*>(builder.get());
auto x = dynamic_cast<arrow::UInt32Builder*>(builder->child(0));
auto y = dynamic_cast<arrow::UInt64Builder*>(builder->child(1));
RETURN_NOT_OK(builder->Reserve(batch.length));
RETURN_NOT_OK(x->Reserve(batch.length));
RETURN_NOT_OK(y->Reserve(batch.length));
const uint64_t* pn = batch[0].array.GetValues<uint64_t>(1);
for (int64_t i = 0; i < batch.length; ++i) {
RETURN_NOT_OK(x->Append(pn[i] % 10));
RETURN_NOT_OK(y->Append(pn[i] ^ 1231231231231));
RETURN_NOT_OK(struct_->Append());
}
RETURN_NOT_OK(builder->FinishInternal(&std::get<std::shared_ptr<arrow::ArrayData>>(out->value)));
return arrow::Status::OK();
}
...
void RegisterUdf() {
auto out_type = arrow::struct_({
arrow::field("x", arrow::uint32()),
arrow::field("y", arrow::uint64()),
});
cp::ScalarKernel kernel({arrow::uint64()}, out_type, UdfExec);
...
}
```
I think it would be useful to support such case in kernel's memory preallocation. Or my workaround would help someone else.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] snizovtsev commented on issue #34538: [Compute] Output preallocation for kernels producing structs
Posted by "snizovtsev (via GitHub)" <gi...@apache.org>.
snizovtsev commented on issue #34538:
URL: https://github.com/apache/arrow/issues/34538#issuecomment-1479818234
Yes, I think I can try to contribute PR for this issue some time later in a spare time. After some changes in a project where I use arrow I don't need to return struct anymore.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] westonpace commented on issue #34538: [Compute] Output preallocation for kernels producing structs
Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34538:
URL: https://github.com/apache/arrow/issues/34538#issuecomment-1468673436
This sounds like a good idea to me. Are you interested in creating a PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org