You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by "Calder, Matthew" <mc...@xbktrading.com> on 2020/01/08 12:49:29 UTC
Copying and memory ownership question
Hi,
I created a minimal cython interface to c++ and I am unsure of whether or not memory is copied and how it is eventually freed. My files are:
--- xbk.hpp ---
#pragma once
#include <arrow/api.h>
namespace xbk {
std::shared_ptr<arrow::Array> makeArray();
}
--- xbk.cpp ---
#include <vector>
#include "xbk.hpp"
namespace xbk {
std::shared_ptr<arrow::Array> makeArray()
{
std::vector<std::string> v = {"A", "B", "C"};
arrow::StringBuilder builder;
builder.AppendValues(v);
std::shared_ptr<arrow::Array> array;
builder.Finish(&array);
return array;
}
}
--- xbk.pxd ---
from pyarrow.lib cimport *
cdef extern from "xbk.cpp":
pass
cdef extern from "xbk.hpp" namespace "xbk":
cdef shared_ptr[CArray] makeArray()
--- xbk_arrow.pyx ---
# distutils: language = c++
from xbk cimport makeArray
from pyarrow.lib cimport *
def makeArrayWrapper():
a = makeArray()
return pyarrow_wrap_array(a)
--- caller.py ---
from xbk_arrow import makeArrayWrapper
a = makeArrayWrapper()
f"{a[0]} {a[1]} {a[2]}"
My questions are: when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object? Also, is the memory freed by the python gc and/or the c++ lib in a timely way? If there is copying or leaking in the above setup, what is the correct way to pass arrow objects created in c++ libraries back to python without copying or leaking? I read over https://arrow.apache.org/docs/python/extending.html but I am still unsure. Thanks for any help,
Matt
The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.
Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.
If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.
Disclaimer Version MB.US.1
RE: Copying and memory ownership question
Posted by "Calder, Matthew" <mc...@xbktrading.com>.
Thanks Wes that's what I was hoping to hear.
Matt
-----Original Message-----
From: Wes McKinney <we...@gmail.com>
Sent: Wednesday, January 8, 2020 12:29 PM
To: user@arrow.apache.org
Subject: Re: Copying and memory ownership question
hi Matt,
> when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object?
No, the object is managed by a shared_ptr, so the underlying object is not copied
> Also, is the memory freed by the python gc and/or the c++ lib in a timely way?
The memory is released as soon as the underlying array is destructed.
For example, in
std::shared_ptr<arrow::Array> array;
builder.Finish(&array);
if you allow "array" to go out of scope, the memory buffers will be released immediately. You can confirm this by looking at the
MemoryPool* you used when creating the array (here you used
arrow::default_memory_pool())
> If there is copying or leaking in the above setup, what is the
> correct way to pass arrow objects created in c++ libraries back to
> python without copying or leaking
There isn't any copying or leaking in the code you provided -- the object returned by pyarrow_wrap_array will follow normal Python object semantics in Cython or Python. As soon as the Python wrapper object is gc'd the C++ shared_ptr inside is destroyed. If it's the only shared_ptr referencing the array (which it is in your example) then the C++ object will be destroyed and the memory released
- Wes
On Wed, Jan 8, 2020 at 6:49 AM Calder, Matthew <mc...@xbktrading.com> wrote:
>
> Hi,
>
>
>
> I created a minimal cython interface to c++ and I am unsure of whether or not memory is copied and how it is eventually freed. My files are:
>
>
>
> --- xbk.hpp ---
>
> #pragma once
>
> #include <arrow/api.h>
>
> namespace xbk {
>
> std::shared_ptr<arrow::Array> makeArray();
>
> }
>
>
>
> --- xbk.cpp ---
>
> #include <vector>
>
> #include "xbk.hpp"
>
> namespace xbk {
>
> std::shared_ptr<arrow::Array> makeArray()
>
> {
>
> std::vector<std::string> v = {"A", "B", "C"};
>
> arrow::StringBuilder builder;
>
> builder.AppendValues(v);
>
> std::shared_ptr<arrow::Array> array;
>
> builder.Finish(&array);
>
> return array;
>
> }
>
> }
>
>
>
> --- xbk.pxd ---
>
> from pyarrow.lib cimport *
>
> cdef extern from "xbk.cpp":
>
> pass
>
> cdef extern from "xbk.hpp" namespace "xbk":
>
> cdef shared_ptr[CArray] makeArray()
>
>
>
> --- xbk_arrow.pyx ---
>
> # distutils: language = c++
>
> from xbk cimport makeArray
>
> from pyarrow.lib cimport *
>
>
>
> def makeArrayWrapper():
>
> a = makeArray()
>
> return pyarrow_wrap_array(a)
>
>
>
> --- caller.py ---
>
> from xbk_arrow import makeArrayWrapper
>
> a = makeArrayWrapper()
>
> f"{a[0]} {a[1]} {a[2]}"
>
>
>
>
>
> My questions are: when calling makeArrayWrapper from caller.py is the
> Array created within the makeArray function copied when it is
> converted into a python object? Also, is the memory freed by the
> python gc and/or the c++ lib in a timely way? If there is copying or
> leaking in the above setup, what is the correct way to pass arrow
> objects created in c++ libraries back to python without copying or
> leaking? I read over
> https://clicktime.symantec.com/3SL3EQ7nJYGDFQz2HASRqrs7Vc?u=https%3A%2
> F%2Farrow.apache.org%2Fdocs%2Fpython%2Fextending.html but I am still
> unsure. Thanks for any help,
>
>
>
> Matt
>
>
>
>
> The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1
The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.
Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.
If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.
Disclaimer Version MB.US.1
Re: Copying and memory ownership question
Posted by Wes McKinney <we...@gmail.com>.
hi Matt,
> when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object?
No, the object is managed by a shared_ptr, so the underlying object is
not copied
> Also, is the memory freed by the python gc and/or the c++ lib in a timely way?
The memory is released as soon as the underlying array is destructed.
For example, in
std::shared_ptr<arrow::Array> array;
builder.Finish(&array);
if you allow "array" to go out of scope, the memory buffers will be
released immediately. You can confirm this by looking at the
MemoryPool* you used when creating the array (here you used
arrow::default_memory_pool())
> If there is copying or leaking in the above setup, what is the correct way to pass arrow objects created in c++ libraries back to python without copying or leaking
There isn't any copying or leaking in the code you provided -- the
object returned by pyarrow_wrap_array will follow normal Python object
semantics in Cython or Python. As soon as the Python wrapper object is
gc'd the C++ shared_ptr inside is destroyed. If it's the only
shared_ptr referencing the array (which it is in your example) then
the C++ object will be destroyed and the memory released
- Wes
On Wed, Jan 8, 2020 at 6:49 AM Calder, Matthew <mc...@xbktrading.com> wrote:
>
> Hi,
>
>
>
> I created a minimal cython interface to c++ and I am unsure of whether or not memory is copied and how it is eventually freed. My files are:
>
>
>
> --- xbk.hpp ---
>
> #pragma once
>
> #include <arrow/api.h>
>
> namespace xbk {
>
> std::shared_ptr<arrow::Array> makeArray();
>
> }
>
>
>
> --- xbk.cpp ---
>
> #include <vector>
>
> #include "xbk.hpp"
>
> namespace xbk {
>
> std::shared_ptr<arrow::Array> makeArray()
>
> {
>
> std::vector<std::string> v = {"A", "B", "C"};
>
> arrow::StringBuilder builder;
>
> builder.AppendValues(v);
>
> std::shared_ptr<arrow::Array> array;
>
> builder.Finish(&array);
>
> return array;
>
> }
>
> }
>
>
>
> --- xbk.pxd ---
>
> from pyarrow.lib cimport *
>
> cdef extern from "xbk.cpp":
>
> pass
>
> cdef extern from "xbk.hpp" namespace "xbk":
>
> cdef shared_ptr[CArray] makeArray()
>
>
>
> --- xbk_arrow.pyx ---
>
> # distutils: language = c++
>
> from xbk cimport makeArray
>
> from pyarrow.lib cimport *
>
>
>
> def makeArrayWrapper():
>
> a = makeArray()
>
> return pyarrow_wrap_array(a)
>
>
>
> --- caller.py ---
>
> from xbk_arrow import makeArrayWrapper
>
> a = makeArrayWrapper()
>
> f"{a[0]} {a[1]} {a[2]}"
>
>
>
>
>
> My questions are: when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object? Also, is the memory freed by the python gc and/or the c++ lib in a timely way? If there is copying or leaking in the above setup, what is the correct way to pass arrow objects created in c++ libraries back to python without copying or leaking? I read over https://arrow.apache.org/docs/python/extending.html but I am still unsure. Thanks for any help,
>
>
>
> Matt
>
>
>
>
> The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1