You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "snizovtsev (via GitHub)" <gi...@apache.org> on 2023/03/12 16:22:00 UTC

[GitHub] [arrow] snizovtsev opened a new issue, #34538: [Compute] Output preallocation for kernels producing structs

snizovtsev opened a new issue, #34538:
URL: https://github.com/apache/arrow/issues/34538

   ### Describe the enhancement requested
   
   Hello,
   
   On my project I faced the need to emit a pair of columns from my compute kernel. Since kernel's signatures allows only 1 output argument I wrapped multiple columns under Struct type. Then I realize that `MemAllocation::PREALLOCATE` doesn't allocate child buffers where struct's fields are stored:
   
   ```cpp
     Result<std::shared_ptr<ArrayData>> KernelExecutorImpl::PrepareOutput(int64_t length) {
       auto out = std::make_shared<ArrayData>(output_type_.GetSharedPtr(), length);
       out->buffers.resize(output_num_buffers_);
   
   ...
       for (size_t i = 0; i < data_preallocated_.size(); ++i) {
         const auto& prealloc = data_preallocated_[i];
         if (prealloc.bit_width >= 0) {
           ARROW_ASSIGN_OR_RAISE(
               out->buffers[i + 1],
               AllocateDataBuffer(kernel_ctx_, length + prealloc.added_length,
                                  prealloc.bit_width));
         }
       }
       
       // out->child_data untouched
       return out;
     }
   ```
   
   and then
   
   ```cpp
   arrow::Status UdfExec(cp::KernelContext* ctx, const cp::ExecSpan& batch,
                                  cp::ExecResult* out)
   {
     CHECK_EQ(out->array_data()->buffers.size(), 1); // validity bitmap allocated
     CHECK_EQ(out->array_data()->child_data.size(), 0); // no data for struct fields allocated
     return arrow::Status::OK();
   }
   
   ```
   
   As workaround have to set `kernel.mem_allocation = cp::MemAllocation::NO_PREALLOCATE;` and use regular builders in my kernel:
   
   ```cpp
   arrow::Status UdfExec(cp::KernelContext* ctx, const cp::ExecSpan& batch,
                                  cp::ExecResult* out)
   {
     std::unique_ptr<arrow::ArrayBuilder> builder;
     RETURN_NOT_OK(arrow::MakeBuilder(ctx->memory_pool(),
                                      out->array_data()->type,
                                      &builder));
   
     auto struct_ = dynamic_cast<arrow::StructBuilder*>(builder.get());
     auto x = dynamic_cast<arrow::UInt32Builder*>(builder->child(0));
     auto y = dynamic_cast<arrow::UInt64Builder*>(builder->child(1));
   
     RETURN_NOT_OK(builder->Reserve(batch.length));
     RETURN_NOT_OK(x->Reserve(batch.length));
     RETURN_NOT_OK(y->Reserve(batch.length));
   
     const uint64_t* pn = batch[0].array.GetValues<uint64_t>(1);
     for (int64_t i = 0; i < batch.length; ++i) {
       RETURN_NOT_OK(x->Append(pn[i] % 10));
       RETURN_NOT_OK(y->Append(pn[i] ^ 1231231231231));
       RETURN_NOT_OK(struct_->Append());
     }
   
     RETURN_NOT_OK(builder->FinishInternal(&std::get<std::shared_ptr<arrow::ArrayData>>(out->value)));
     return arrow::Status::OK();
   }
   
   ...
   void RegisterUdf() {
     auto out_type = arrow::struct_({
       arrow::field("x", arrow::uint32()),
       arrow::field("y", arrow::uint64()),
     });
     cp::ScalarKernel kernel({arrow::uint64()}, out_type, UdfExec);
     ...
   }
   ```
   
   I think it would be useful to support such case in kernel's memory preallocation. Or my workaround would help someone else.
   
   ### Component(s)
   
   C++


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] snizovtsev commented on issue #34538: [Compute] Output preallocation for kernels producing structs

Posted by "snizovtsev (via GitHub)" <gi...@apache.org>.
snizovtsev commented on issue #34538:
URL: https://github.com/apache/arrow/issues/34538#issuecomment-1479818234

   Yes, I think I can try to contribute PR for this issue some time later in a spare time. After some changes in a project where I use arrow I don't need to return struct anymore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on issue #34538: [Compute] Output preallocation for kernels producing structs

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on issue #34538:
URL: https://github.com/apache/arrow/issues/34538#issuecomment-1468673436

   This sounds like a good idea to me.  Are you interested in creating a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org