You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Rares Vernica <rv...@gmail.com> on 2021/07/30 16:55:33 UTC
[C++] Unable to getMutableValues from ArrayData
Hello,
I have a RecordBatch that I read from an IPC file. I need to run a
cumulative sum on one of the int64 arrays in the batch. I tried to do:
std::shared_ptr<arrow::ArrayData> pos_data = batch->column_data(nAtts);
auto pos_values = pos_data->GetMutableValues<int64_t>(1);
for (auto i = 1; i < pos_data->length; i++)
pos_values[i] += pos_values[i - 1];
but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
can't run the cumulative sum on the returned pointer since it is read-only.
I tried creating a copy of ArryaData like so:
std::shared_ptr<arrow::ArrayData> pos_data =
batch->column_data(nAtts)->Copy();
but it did not help. All this data is in CPU buffers so I'm not sure what
is the problem. What is the best way to mutate or copy & mutate it?
Thanks!
Rares
Re: [C++] Unable to getMutableValues from ArrayData
Posted by Rares Vernica <rv...@gmail.com>.
Thanks all! I ended up with this and it worked fine:
std::shared_ptr<arrow::ArrayData> delta_data =
_arrowBatch->column_data(nAtts);
// COPY delta to pos
std::vector<std::shared_ptr<arrow::Buffer>> pos_buffers(2);
pos_buffers[0] = NULL; // No nulls in the array
ASSIGN_OR_THROW(pos_buffers[1],
arrow::Buffer::Copy(delta_data->buffers[1],
arrow::default_cpu_memory_manager()),
"Copy delta column");
std::shared_ptr<arrow::ArrayData> pos_data = arrow::ArrayData::Make(
arrow::int64(),
delta_data->length,
pos_buffers,
delta_data->null_count);
On Mon, Aug 2, 2021 at 5:45 PM Antoine Pitrou <an...@python.org> wrote:
> On Fri, 30 Jul 2021 18:55:33 +0200
> Rares Vernica <rv...@gmail.com> wrote:
> > Hello,
> >
> > I have a RecordBatch that I read from an IPC file. I need to run a
> > cumulative sum on one of the int64 arrays in the batch. I tried to do:
>
> The ArrayData contents are semantically immutable. You may want to grab
> mutable pointers to the underlying data, but you're generally on your
> own.
>
> > but GetMutableValues returns NULL. If I use GetValue, it works fine, but
> I
> > can't run the cumulative sum on the returned pointer since it is
> read-only.
> > I tried creating a copy of ArryaData like so:
> >
> > std::shared_ptr<arrow::ArrayData> pos_data =
> > batch->column_data(nAtts)->Copy();
>
> ArrayData::Copy() creates a shallow copy of the ArrayData structure.
> We do not include a deep copy facility, because often you only want to
> change one of the underlying buffers.
>
> > but it did not help. All this data is in CPU buffers so I'm not sure what
> > is the problem. What is the best way to mutate or copy & mutate it?
>
> You should probably recreate a new ArrayData with the desired buffers.
> To create a buffer, there are several possibilities: allocate it
> directly (using AllocateBuffer, for example), use
> TypedBufferBuilder<T>, or a hand-rolled method of your choice.
>
> Regards
>
> Antoine.
>
>
>
Re: [C++] Unable to getMutableValues from ArrayData
Posted by Antoine Pitrou <an...@python.org>.
On Fri, 30 Jul 2021 18:55:33 +0200
Rares Vernica <rv...@gmail.com> wrote:
> Hello,
>
> I have a RecordBatch that I read from an IPC file. I need to run a
> cumulative sum on one of the int64 arrays in the batch. I tried to do:
The ArrayData contents are semantically immutable. You may want to grab
mutable pointers to the underlying data, but you're generally on your
own.
> but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
> can't run the cumulative sum on the returned pointer since it is read-only.
> I tried creating a copy of ArryaData like so:
>
> std::shared_ptr<arrow::ArrayData> pos_data =
> batch->column_data(nAtts)->Copy();
ArrayData::Copy() creates a shallow copy of the ArrayData structure.
We do not include a deep copy facility, because often you only want to
change one of the underlying buffers.
> but it did not help. All this data is in CPU buffers so I'm not sure what
> is the problem. What is the best way to mutate or copy & mutate it?
You should probably recreate a new ArrayData with the desired buffers.
To create a buffer, there are several possibilities: allocate it
directly (using AllocateBuffer, for example), use
TypedBufferBuilder<T>, or a hand-rolled method of your choice.
Regards
Antoine.
Re: [C++] Unable to getMutableValues from ArrayData
Posted by Wes McKinney <we...@gmail.com>.
IPC reads do not return mutable buffers right now, so
Buffer::mutable_data will return nullptr. You need to use Buffer::Copy
https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h#L251
You could potentially use const_cast to get a mutable pointer to a
buffer returned by IPC reads, which are always immutable buffers — we
could potentially add a flag to IpcReadOptions to permit returning
mutable buffers.
On Fri, Jul 30, 2021 at 12:49 PM Niranda Perera
<ni...@gmail.com> wrote:
>
> Hi Rares,
>
> ArrayData::GetMutableValues would return a nullptr if the requested buffer
> is not available.
> https://github.com/apache/arrow/blob/557a7c63d49aa04508564517c77c71f3657d19ff/cpp/src/arrow/array/data.h#L199
>
> What does nAtts stand for? could it be that it is OOB?
>
> On Fri, Jul 30, 2021 at 12:55 PM Rares Vernica <rv...@gmail.com> wrote:
>
> > Hello,
> >
> > I have a RecordBatch that I read from an IPC file. I need to run a
> > cumulative sum on one of the int64 arrays in the batch. I tried to do:
> >
> > std::shared_ptr<arrow::ArrayData> pos_data = batch->column_data(nAtts);
> > auto pos_values = pos_data->GetMutableValues<int64_t>(1);
> > for (auto i = 1; i < pos_data->length; i++)
> > pos_values[i] += pos_values[i - 1];
> >
> > but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
> > can't run the cumulative sum on the returned pointer since it is read-only.
> > I tried creating a copy of ArryaData like so:
> >
> > std::shared_ptr<arrow::ArrayData> pos_data =
> > batch->column_data(nAtts)->Copy();
> >
> > but it did not help. All this data is in CPU buffers so I'm not sure what
> > is the problem. What is the best way to mutate or copy & mutate it?
> >
> > Thanks!
> > Rares
> >
>
>
> --
> Niranda Perera
> https://niranda.dev/
> @n1r44 <https://twitter.com/N1R44>
Re: [C++] Unable to getMutableValues from ArrayData
Posted by Niranda Perera <ni...@gmail.com>.
Hi Rares,
ArrayData::GetMutableValues would return a nullptr if the requested buffer
is not available.
https://github.com/apache/arrow/blob/557a7c63d49aa04508564517c77c71f3657d19ff/cpp/src/arrow/array/data.h#L199
What does nAtts stand for? could it be that it is OOB?
On Fri, Jul 30, 2021 at 12:55 PM Rares Vernica <rv...@gmail.com> wrote:
> Hello,
>
> I have a RecordBatch that I read from an IPC file. I need to run a
> cumulative sum on one of the int64 arrays in the batch. I tried to do:
>
> std::shared_ptr<arrow::ArrayData> pos_data = batch->column_data(nAtts);
> auto pos_values = pos_data->GetMutableValues<int64_t>(1);
> for (auto i = 1; i < pos_data->length; i++)
> pos_values[i] += pos_values[i - 1];
>
> but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
> can't run the cumulative sum on the returned pointer since it is read-only.
> I tried creating a copy of ArryaData like so:
>
> std::shared_ptr<arrow::ArrayData> pos_data =
> batch->column_data(nAtts)->Copy();
>
> but it did not help. All this data is in CPU buffers so I'm not sure what
> is the problem. What is the best way to mutate or copy & mutate it?
>
> Thanks!
> Rares
>
--
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>