You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Rares Vernica <rv...@gmail.com> on 2021/07/30 16:55:33 UTC

[C++] Unable to getMutableValues from ArrayData

Hello,

I have a RecordBatch that I read from an IPC file. I need to run a
cumulative sum on one of the int64 arrays in the batch. I tried to do:

std::shared_ptr<arrow::ArrayData> pos_data = batch->column_data(nAtts);
auto pos_values = pos_data->GetMutableValues<int64_t>(1);
for (auto i = 1; i < pos_data->length; i++)
pos_values[i] += pos_values[i - 1];

but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
can't run the cumulative sum on the returned pointer since it is read-only.
I tried creating a copy of ArryaData like so:

std::shared_ptr<arrow::ArrayData> pos_data =
batch->column_data(nAtts)->Copy();

but it did not help. All this data is in CPU buffers so I'm not sure what
is the problem. What is the best way to mutate or copy & mutate it?

Thanks!
Rares

Re: [C++] Unable to getMutableValues from ArrayData

Posted by Rares Vernica <rv...@gmail.com>.
Thanks all! I ended up with this and it worked fine:

        std::shared_ptr<arrow::ArrayData> delta_data =
_arrowBatch->column_data(nAtts);

        // COPY delta to pos
        std::vector<std::shared_ptr<arrow::Buffer>> pos_buffers(2);
        pos_buffers[0] = NULL; // No nulls in the array
        ASSIGN_OR_THROW(pos_buffers[1],
                        arrow::Buffer::Copy(delta_data->buffers[1],

arrow::default_cpu_memory_manager()),
                        "Copy delta column");
        std::shared_ptr<arrow::ArrayData> pos_data = arrow::ArrayData::Make(
            arrow::int64(),
            delta_data->length,
            pos_buffers,
            delta_data->null_count);

On Mon, Aug 2, 2021 at 5:45 PM Antoine Pitrou <an...@python.org> wrote:

> On Fri, 30 Jul 2021 18:55:33 +0200
> Rares Vernica <rv...@gmail.com> wrote:
> > Hello,
> >
> > I have a RecordBatch that I read from an IPC file. I need to run a
> > cumulative sum on one of the int64 arrays in the batch. I tried to do:
>
> The ArrayData contents are semantically immutable.  You may want to grab
> mutable pointers to the underlying data, but you're generally on your
> own.
>
> > but GetMutableValues returns NULL. If I use GetValue, it works fine, but
> I
> > can't run the cumulative sum on the returned pointer since it is
> read-only.
> > I tried creating a copy of ArryaData like so:
> >
> > std::shared_ptr<arrow::ArrayData> pos_data =
> > batch->column_data(nAtts)->Copy();
>
> ArrayData::Copy() creates a shallow copy of the ArrayData structure.
> We do not include a deep copy facility, because often you only want to
> change one of the underlying buffers.
>
> > but it did not help. All this data is in CPU buffers so I'm not sure what
> > is the problem. What is the best way to mutate or copy & mutate it?
>
> You should probably recreate a new ArrayData with the desired buffers.
> To create a buffer, there are several possibilities: allocate it
> directly (using AllocateBuffer, for example), use
> TypedBufferBuilder<T>, or a hand-rolled method of your choice.
>
> Regards
>
> Antoine.
>
>
>

Re: [C++] Unable to getMutableValues from ArrayData

Posted by Antoine Pitrou <an...@python.org>.
On Fri, 30 Jul 2021 18:55:33 +0200
Rares Vernica <rv...@gmail.com> wrote:
> Hello,
> 
> I have a RecordBatch that I read from an IPC file. I need to run a
> cumulative sum on one of the int64 arrays in the batch. I tried to do:

The ArrayData contents are semantically immutable.  You may want to grab
mutable pointers to the underlying data, but you're generally on your
own.

> but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
> can't run the cumulative sum on the returned pointer since it is read-only.
> I tried creating a copy of ArryaData like so:
> 
> std::shared_ptr<arrow::ArrayData> pos_data =
> batch->column_data(nAtts)->Copy();

ArrayData::Copy() creates a shallow copy of the ArrayData structure.
We do not include a deep copy facility, because often you only want to
change one of the underlying buffers.

> but it did not help. All this data is in CPU buffers so I'm not sure what
> is the problem. What is the best way to mutate or copy & mutate it?

You should probably recreate a new ArrayData with the desired buffers.
To create a buffer, there are several possibilities: allocate it
directly (using AllocateBuffer, for example), use
TypedBufferBuilder<T>, or a hand-rolled method of your choice.

Regards

Antoine.



Re: [C++] Unable to getMutableValues from ArrayData

Posted by Wes McKinney <we...@gmail.com>.
IPC reads do not return mutable buffers right now, so
Buffer::mutable_data will return nullptr. You need to use Buffer::Copy

https://github.com/apache/arrow/blob/master/cpp/src/arrow/buffer.h#L251

You could potentially use const_cast to get a mutable pointer to a
buffer returned by IPC reads, which are always immutable buffers — we
could potentially add a flag to IpcReadOptions to permit returning
mutable buffers.

On Fri, Jul 30, 2021 at 12:49 PM Niranda Perera
<ni...@gmail.com> wrote:
>
> Hi Rares,
>
> ArrayData::GetMutableValues would return a nullptr if the requested buffer
> is not available.
> https://github.com/apache/arrow/blob/557a7c63d49aa04508564517c77c71f3657d19ff/cpp/src/arrow/array/data.h#L199
>
> What does nAtts stand for? could it be that it is OOB?
>
> On Fri, Jul 30, 2021 at 12:55 PM Rares Vernica <rv...@gmail.com> wrote:
>
> > Hello,
> >
> > I have a RecordBatch that I read from an IPC file. I need to run a
> > cumulative sum on one of the int64 arrays in the batch. I tried to do:
> >
> > std::shared_ptr<arrow::ArrayData> pos_data = batch->column_data(nAtts);
> > auto pos_values = pos_data->GetMutableValues<int64_t>(1);
> > for (auto i = 1; i < pos_data->length; i++)
> > pos_values[i] += pos_values[i - 1];
> >
> > but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
> > can't run the cumulative sum on the returned pointer since it is read-only.
> > I tried creating a copy of ArryaData like so:
> >
> > std::shared_ptr<arrow::ArrayData> pos_data =
> > batch->column_data(nAtts)->Copy();
> >
> > but it did not help. All this data is in CPU buffers so I'm not sure what
> > is the problem. What is the best way to mutate or copy & mutate it?
> >
> > Thanks!
> > Rares
> >
>
>
> --
> Niranda Perera
> https://niranda.dev/
> @n1r44 <https://twitter.com/N1R44>

Re: [C++] Unable to getMutableValues from ArrayData

Posted by Niranda Perera <ni...@gmail.com>.
Hi Rares,

ArrayData::GetMutableValues would return a nullptr if the requested buffer
is not available.
https://github.com/apache/arrow/blob/557a7c63d49aa04508564517c77c71f3657d19ff/cpp/src/arrow/array/data.h#L199

What does nAtts stand for? could it be that it is OOB?

On Fri, Jul 30, 2021 at 12:55 PM Rares Vernica <rv...@gmail.com> wrote:

> Hello,
>
> I have a RecordBatch that I read from an IPC file. I need to run a
> cumulative sum on one of the int64 arrays in the batch. I tried to do:
>
> std::shared_ptr<arrow::ArrayData> pos_data = batch->column_data(nAtts);
> auto pos_values = pos_data->GetMutableValues<int64_t>(1);
> for (auto i = 1; i < pos_data->length; i++)
> pos_values[i] += pos_values[i - 1];
>
> but GetMutableValues returns NULL. If I use GetValue, it works fine, but I
> can't run the cumulative sum on the returned pointer since it is read-only.
> I tried creating a copy of ArryaData like so:
>
> std::shared_ptr<arrow::ArrayData> pos_data =
> batch->column_data(nAtts)->Copy();
>
> but it did not help. All this data is in CPU buffers so I'm not sure what
> is the problem. What is the best way to mutate or copy & mutate it?
>
> Thanks!
> Rares
>


-- 
Niranda Perera
https://niranda.dev/
@n1r44 <https://twitter.com/N1R44>