You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Surya Kiran Gullapalli <su...@gmail.com> on 2023/05/04 17:27:43 UTC
[C++] std::vector to Datum
Hello,
I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
The arrow::compute::SetLookupOptions takes in a datum (array of of strings,
in my case to search).
I tried this
std::vector<std::string> vec;
auto buffer = arrow::Buffer::Wrap(vec);
auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(),
{nullptr, buffer});
auto options = arrow::compute::SetLookupOptions(arrayData);
auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
This is resulting in a crash.
I tried calling arrow::MakeArray(arrayData), and that is also failing.
But if I convert the std::vector to arrow::Array (using StringBuilder) then
there's no crash and I'm getting expected results.
Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or
I'm missing something ?
Thanks,
Surya
Re: [C++] std::vector to Datum
Posted by Felipe Oliveira Carvalho <fe...@gmail.com>.
If you control the function that produces the vector<string>, you can avoid
all these fragmented allocations by re-using the same std::string in a loop
and reserving buffers upfront in the builder:
string_builder.Reserve(number_of_strings);
strinb_builder.ReserveData(sum_of_lengths_of_all_strings_or_an_estimate_of_that);
std::string s;
for (...) {
s.clear(); // this doesn't deallocates s's internal buffer
// ... populate the string s. Avoids new memory allocation if smaller
than biggest string so far.
RETURN_NOT_OK(string_builder.Append(s));
}
--
Felipe
On Thu, May 4, 2023 at 3:09 PM Felipe Oliveira Carvalho <fe...@gmail.com>
wrote:
> std::vector<std::string>::data() returns a buffer containing pointers to
> the individual string buffers and Arrow needs a buffer with contiguous
> variable-length character data.
>
> And that is buffers[2]. buffers[1] contains the offsets for beginning and
> end of the strings in buffers[2].
>
> So yes, use the StringBuilder.
>
> --
> Felipe
>
> On Thu, May 4, 2023 at 2:28 PM Surya Kiran Gullapalli <
> suryakiran.gullapalli@gmail.com> wrote:
>
>> Hello,
>> I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
>> The arrow::compute::SetLookupOptions takes in a datum (array of of
>> strings, in my case to search).
>>
>> I tried this
>>
>> std::vector<std::string> vec;
>> auto buffer = arrow::Buffer::Wrap(vec);
>> auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(),
>> {nullptr, buffer});
>> auto options = arrow::compute::SetLookupOptions(arrayData);
>> auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
>>
>> This is resulting in a crash.
>>
>> I tried calling arrow::MakeArray(arrayData), and that is also failing.
>>
>> But if I convert the std::vector to arrow::Array (using StringBuilder)
>> then there's no crash and I'm getting expected results.
>>
>> Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or
>> I'm missing something ?
>>
>> Thanks,
>> Surya
>>
>
Re: [C++] std::vector to Datum
Posted by Aldrin <oc...@pm.me>.
To give you a bit of overview that you may be missing, in order of abstraction (high to low):
- Datum is like a wrapper that provides union semantics, in the C sense. For example, it contains an Array or a ChunkedArray or a Table, etc. but one and only one of them.
- Array is like an interface and it stores data in ArrayData
- ArrayData is like a container that owns data (it is responsible for releasing the data) and provides functions to interact with that data
- Buffer is how the data is stored, but it is used for the values, for pointers into the values, and for a bitmap which indicates which values are null (I did not describe these in any particular order)
I didn't find a good spot in the documentation that mentions this, but [1] shows the types that you can/should put into Datum. So, compute functions typically expect Arrays (or something that can be wrapped in Datum); ArrayData is a lower level of abstraction than they're expecting.
[1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/datum.h#L54
# ------------------------------
# Aldrin
https://github.com/drin/
https://gitlab.com/octalene
Sent with Proton Mail secure email.
------- Original Message -------
On Thursday, May 4th, 2023 at 11:09, Felipe Oliveira Carvalho <fe...@gmail.com> wrote:
> std::vector<std::string>::data() returns a buffer containing pointers to the individual string buffers and Arrow needs a buffer with contiguous variable-length character data.
> And that is buffers[2]. buffers[1] contains the offsets for beginning and end of the strings in buffers[2].
> So yes, use the StringBuilder.
>
> --
> Felipe
>
> On Thu, May 4, 2023 at 2:28 PM Surya Kiran Gullapalli <su...@gmail.com> wrote:
>
> > Hello,
> > I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
> > The arrow::compute::SetLookupOptions takes in a datum (array of of strings, in my case to search).
> >
> > I tried this
> >
> > std::vector<std::string> vec;
> > auto buffer = arrow::Buffer::Wrap(vec);
> > auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(), {nullptr, buffer});
> > auto options = arrow::compute::SetLookupOptions(arrayData);
> > auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
> >
> > This is resulting in a crash.
> >
> > I tried calling arrow::MakeArray(arrayData), and that is also failing.
> >
> > But if I convert the std::vector to arrow::Array (using StringBuilder) then there's no crash and I'm getting expected results.
> >
> > Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or I'm missing something ?
> >
> > Thanks,
> > Surya
Re: [C++] std::vector to Datum
Posted by Felipe Oliveira Carvalho <fe...@gmail.com>.
std::vector<std::string>::data() returns a buffer containing pointers to
the individual string buffers and Arrow needs a buffer with contiguous
variable-length character data.
And that is buffers[2]. buffers[1] contains the offsets for beginning and
end of the strings in buffers[2].
So yes, use the StringBuilder.
--
Felipe
On Thu, May 4, 2023 at 2:28 PM Surya Kiran Gullapalli <
suryakiran.gullapalli@gmail.com> wrote:
> Hello,
> I'm trying to use an std::vector (of strings) in CallFunction ('is_in').
> The arrow::compute::SetLookupOptions takes in a datum (array of of
> strings, in my case to search).
>
> I tried this
>
> std::vector<std::string> vec;
> auto buffer = arrow::Buffer::Wrap(vec);
> auto arrayData = arrow::ArrayData::Make (arrow::utf8(), vec.size(),
> {nullptr, buffer});
> auto options = arrow::compute::SetLookupOptions(arrayData);
> auto res = arrow::compute::CallFunction ("is_in", {arrowArray}, &options);
>
> This is resulting in a crash.
>
> I tried calling arrow::MakeArray(arrayData), and that is also failing.
>
> But if I convert the std::vector to arrow::Array (using StringBuilder)
> then there's no crash and I'm getting expected results.
>
> Am I using the arrow::Buffer/arrow::ArrayData/arrow::Datum correctly, or
> I'm missing something ?
>
> Thanks,
> Surya
>