You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by "Calder, Matthew" <mc...@xbktrading.com> on 2020/01/08 12:49:29 UTC

Copying and memory ownership question

Hi,

I created a minimal cython interface to c++ and I am unsure of whether or not memory is copied and how it is eventually freed. My files are:

--- xbk.hpp ---
#pragma once
#include <arrow/api.h>
namespace xbk {
    std::shared_ptr<arrow::Array> makeArray();
}

--- xbk.cpp ---
#include <vector>
#include "xbk.hpp"
namespace xbk {
    std::shared_ptr<arrow::Array> makeArray()
    {
        std::vector<std::string> v = {"A", "B", "C"};
        arrow::StringBuilder builder;
        builder.AppendValues(v);
        std::shared_ptr<arrow::Array> array;
        builder.Finish(&array);
        return array;
    }
}

--- xbk.pxd ---
from pyarrow.lib cimport *
cdef extern from "xbk.cpp":
    pass
cdef extern from "xbk.hpp" namespace "xbk":
    cdef shared_ptr[CArray] makeArray()

--- xbk_arrow.pyx ---
# distutils: language = c++
from xbk cimport makeArray
from pyarrow.lib cimport *

def makeArrayWrapper():
    a = makeArray()
    return pyarrow_wrap_array(a)

--- caller.py ---
from xbk_arrow import makeArrayWrapper
a = makeArrayWrapper()
f"{a[0]} {a[1]} {a[2]}"


My questions are: when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object? Also, is the memory freed by the python gc and/or the c++ lib in a timely way? If there is copying or leaking in the above setup,  what is the correct way to pass arrow objects created in c++ libraries back to python without copying or leaking? I read over https://arrow.apache.org/docs/python/extending.html but I am still unsure. Thanks for any help,

Matt


The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.

Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.

If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.

Disclaimer Version MB.US.1

RE: Copying and memory ownership question

Posted by "Calder, Matthew" <mc...@xbktrading.com>.
Thanks Wes that's what I was hoping to hear.

Matt

-----Original Message-----
From: Wes McKinney <we...@gmail.com> 
Sent: Wednesday, January 8, 2020 12:29 PM
To: user@arrow.apache.org
Subject: Re: Copying and memory ownership question

hi Matt,

> when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object?

No, the object is managed by a shared_ptr, so the underlying object is not copied

> Also, is the memory freed by the python gc and/or the c++ lib in a timely way?

The memory is released as soon as the underlying array is destructed.
For example, in

    std::shared_ptr<arrow::Array> array;
    builder.Finish(&array);

if you allow "array" to go out of scope, the memory buffers will be released immediately. You can confirm this by looking at the
MemoryPool* you used when creating the array (here you used
arrow::default_memory_pool())

> If there is copying or leaking in the above setup,  what is the 
> correct way to pass arrow objects created in c++ libraries back to 
> python without copying or leaking

There isn't any copying or leaking in the code you provided -- the object returned by pyarrow_wrap_array will follow normal Python object semantics in Cython or Python. As soon as the Python wrapper object is gc'd the C++ shared_ptr inside is destroyed. If it's the only shared_ptr referencing the array (which it is in your example) then the C++ object will be destroyed and the memory released

- Wes

On Wed, Jan 8, 2020 at 6:49 AM Calder, Matthew <mc...@xbktrading.com> wrote:
>
> Hi,
>
>
>
> I created a minimal cython interface to c++ and I am unsure of whether or not memory is copied and how it is eventually freed. My files are:
>
>
>
> --- xbk.hpp ---
>
> #pragma once
>
> #include <arrow/api.h>
>
> namespace xbk {
>
>     std::shared_ptr<arrow::Array> makeArray();
>
> }
>
>
>
> --- xbk.cpp ---
>
> #include <vector>
>
> #include "xbk.hpp"
>
> namespace xbk {
>
>     std::shared_ptr<arrow::Array> makeArray()
>
>     {
>
>         std::vector<std::string> v = {"A", "B", "C"};
>
>         arrow::StringBuilder builder;
>
>         builder.AppendValues(v);
>
>         std::shared_ptr<arrow::Array> array;
>
>         builder.Finish(&array);
>
>         return array;
>
>     }
>
> }
>
>
>
> --- xbk.pxd ---
>
> from pyarrow.lib cimport *
>
> cdef extern from "xbk.cpp":
>
>     pass
>
> cdef extern from "xbk.hpp" namespace "xbk":
>
>     cdef shared_ptr[CArray] makeArray()
>
>
>
> --- xbk_arrow.pyx ---
>
> # distutils: language = c++
>
> from xbk cimport makeArray
>
> from pyarrow.lib cimport *
>
>
>
> def makeArrayWrapper():
>
>     a = makeArray()
>
>     return pyarrow_wrap_array(a)
>
>
>
> --- caller.py ---
>
> from xbk_arrow import makeArrayWrapper
>
> a = makeArrayWrapper()
>
> f"{a[0]} {a[1]} {a[2]}"
>
>
>
>
>
> My questions are: when calling makeArrayWrapper from caller.py is the 
> Array created within the makeArray function copied when it is 
> converted into a python object? Also, is the memory freed by the 
> python gc and/or the c++ lib in a timely way? If there is copying or 
> leaking in the above setup,  what is the correct way to pass arrow 
> objects created in c++ libraries back to python without copying or 
> leaking? I read over 
> https://clicktime.symantec.com/3SL3EQ7nJYGDFQz2HASRqrs7Vc?u=https%3A%2
> F%2Farrow.apache.org%2Fdocs%2Fpython%2Fextending.html but I am still 
> unsure. Thanks for any help,
>
>
>
> Matt
>
>
>
>
> The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1

The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.

Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.

If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.

Disclaimer Version MB.US.1

Re: Copying and memory ownership question

Posted by Wes McKinney <we...@gmail.com>.
hi Matt,

> when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object?

No, the object is managed by a shared_ptr, so the underlying object is
not copied

> Also, is the memory freed by the python gc and/or the c++ lib in a timely way?

The memory is released as soon as the underlying array is destructed.
For example, in

    std::shared_ptr<arrow::Array> array;
    builder.Finish(&array);

if you allow "array" to go out of scope, the memory buffers will be
released immediately. You can confirm this by looking at the
MemoryPool* you used when creating the array (here you used
arrow::default_memory_pool())

> If there is copying or leaking in the above setup,  what is the correct way to pass arrow objects created in c++ libraries back to python without copying or leaking

There isn't any copying or leaking in the code you provided -- the
object returned by pyarrow_wrap_array will follow normal Python object
semantics in Cython or Python. As soon as the Python wrapper object is
gc'd the C++ shared_ptr inside is destroyed. If it's the only
shared_ptr referencing the array (which it is in your example) then
the C++ object will be destroyed and the memory released

- Wes

On Wed, Jan 8, 2020 at 6:49 AM Calder, Matthew <mc...@xbktrading.com> wrote:
>
> Hi,
>
>
>
> I created a minimal cython interface to c++ and I am unsure of whether or not memory is copied and how it is eventually freed. My files are:
>
>
>
> --- xbk.hpp ---
>
> #pragma once
>
> #include <arrow/api.h>
>
> namespace xbk {
>
>     std::shared_ptr<arrow::Array> makeArray();
>
> }
>
>
>
> --- xbk.cpp ---
>
> #include <vector>
>
> #include "xbk.hpp"
>
> namespace xbk {
>
>     std::shared_ptr<arrow::Array> makeArray()
>
>     {
>
>         std::vector<std::string> v = {"A", "B", "C"};
>
>         arrow::StringBuilder builder;
>
>         builder.AppendValues(v);
>
>         std::shared_ptr<arrow::Array> array;
>
>         builder.Finish(&array);
>
>         return array;
>
>     }
>
> }
>
>
>
> --- xbk.pxd ---
>
> from pyarrow.lib cimport *
>
> cdef extern from "xbk.cpp":
>
>     pass
>
> cdef extern from "xbk.hpp" namespace "xbk":
>
>     cdef shared_ptr[CArray] makeArray()
>
>
>
> --- xbk_arrow.pyx ---
>
> # distutils: language = c++
>
> from xbk cimport makeArray
>
> from pyarrow.lib cimport *
>
>
>
> def makeArrayWrapper():
>
>     a = makeArray()
>
>     return pyarrow_wrap_array(a)
>
>
>
> --- caller.py ---
>
> from xbk_arrow import makeArrayWrapper
>
> a = makeArrayWrapper()
>
> f"{a[0]} {a[1]} {a[2]}"
>
>
>
>
>
> My questions are: when calling makeArrayWrapper from caller.py is the Array created within the makeArray function copied when it is converted into a python object? Also, is the memory freed by the python gc and/or the c++ lib in a timely way? If there is copying or leaking in the above setup,  what is the correct way to pass arrow objects created in c++ libraries back to python without copying or leaking? I read over https://arrow.apache.org/docs/python/extending.html but I am still unsure. Thanks for any help,
>
>
>
> Matt
>
>
>
>
> The information contained in this e-mail may be confidential and is intended solely for the use of the named addressee.
>
> Access, copying or re-use of the e-mail or any information contained therein by any other person is not authorized.
>
> If you are not the intended recipient please notify us immediately by returning the e-mail to the originator.
>
> Disclaimer Version MB.US.1