You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Maciej Skrzypkowski <m....@gmx.com> on 2020/11/09 08:33:13 UTC

Arrow C++ API - memory management

Hi All!

I don't understand memory management in C++ Arrow API. I have some
memory leaks while using it. I've created Stackoverflow question, maybe
someone would answer it:
https://stackoverflow.com/questions/64742588/how-to-manage-memory-while-reading-csv-using-apache-arrow-c-api
.

Thanks,
Maciej Skrzypkowski


Re: Arrow C++ API - memory management

Posted by Maciej Skrzypkowski <m....@gmx.com>.
I've proved that it was my code causing memory leak. Thank again for you
help.

On 10.11.2020 17:01, Maciej Skrzypkowski wrote:
> It seems that the memory leak is caused by other part of my code
> (which I thought to be fine), not related to Arrow. I'll check it more
> and fill issue if there will be need for.
>
> On 10.11.2020 03:10, Wes McKinney wrote:
>> The memory should automatically be freed by any object / shared_ptr /
>> unique_ptr destruction. On Linux we use a background jemalloc thread
>> by default so it may not be freed immediately but it should not be
>> held indefinitely. In any case if you can reproduce the issue
>> consistently we'd be glad to take a look, please open a Jira issue and
>> provide as much information as you can to make it easy for us to
>> reproduce
>>
>> On Mon, Nov 9, 2020 at 9:41 AM Maciej Skrzypkowski
>> <m....@gmx.com> wrote:
>>> OK, thanks for the answer.
>>>
>>> mArrowTable is "std::shared_ptr<arrow::Table> mArrowTable" so should
>>> be managed properly by the shared pointer. I've narrowed down the
>>> problem to code like this:
>>>
>>> void LoadCSVData::ReadArrowTableFromCSV( const std::string & filePath )
>>> {
>>>      auto tableReader = CreateTableReader( filePath );
>>>      //ReadArrowTableUsingReader( *tableReader );
>>> }
>>>
>>> std::shared_ptr<arrow::csv::TableReader>
>>> LoadCSVData::CreateTableReader( const std::string & filePath )
>>> {
>>>      arrow::MemoryPool* pool = arrow::default_memory_pool();
>>>      auto tableReader = arrow::csv::TableReader::Make( pool,
>>> OpenCSVFile( filePath ),
>>> *PrepareReadOptions(), *PrepareParseOptions(),
>>> *PrepareConvertOptions() );
>>>      if ( !tableReader.ok() )
>>>      {
>>>          throw BadParametersException( std::string( "CSV file reader
>>> error: " ) + tableReader.status().ToString() );
>>>      }
>>>      return *tableReader;
>>> }
>>>
>>> Still memory is getting filled while calling ReadArrowTableFromCSV
>>> many times. Is the arrow's memory pool freed while destruction of
>>> TableReader? Or should I free it explicitly?
>>>
>>>
>>> On 09.11.2020 15:01, Wes McKinney wrote:
>>>
>>> We'd prefer to answer questions on the mailing list or Jira (if
>>> something looks like a bug).
>>>
>>> There isn't enough detail on the SO question to understand what other
>>> things might be going on, but you are never destroying
>>> this->mArrowTable which is holding on to allocated memory. If the
>>> memory use keeps going up through repeated calls to the CSV reader
>>> that sounds like a possible leak, so we would need to see more
>>> details, including about your platform.
>>>
>>> On Mon, Nov 9, 2020 at 2:33 AM Maciej Skrzypkowski
>>> <m....@gmx.com> wrote:
>>>
>>> Hi All!
>>>
>>> I don't understand memory management in C++ Arrow API. I have some
>>> memory leaks while using it. I've created Stackoverflow question, maybe
>>> someone would answer it:
>>> https://stackoverflow.com/questions/64742588/how-to-manage-memory-while-reading-csv-using-apache-arrow-c-api
>>>
>>> .
>>>
>>> Thanks,
>>> Maciej Skrzypkowski
>>>

Re: Arrow C++ API - memory management

Posted by Maciej Skrzypkowski <m....@gmx.com>.
It seems that the memory leak is caused by other part of my code (which
I thought to be fine), not related to Arrow. I'll check it more and fill
issue if there will be need for.

On 10.11.2020 03:10, Wes McKinney wrote:
> The memory should automatically be freed by any object / shared_ptr /
> unique_ptr destruction. On Linux we use a background jemalloc thread
> by default so it may not be freed immediately but it should not be
> held indefinitely. In any case if you can reproduce the issue
> consistently we'd be glad to take a look, please open a Jira issue and
> provide as much information as you can to make it easy for us to
> reproduce
>
> On Mon, Nov 9, 2020 at 9:41 AM Maciej Skrzypkowski
> <m....@gmx.com> wrote:
>> OK, thanks for the answer.
>>
>> mArrowTable is "std::shared_ptr<arrow::Table> mArrowTable" so should be managed properly by the shared pointer. I've narrowed down the problem to code like this:
>>
>> void LoadCSVData::ReadArrowTableFromCSV( const std::string & filePath )
>> {
>>      auto tableReader = CreateTableReader( filePath );
>>      //ReadArrowTableUsingReader( *tableReader );
>> }
>>
>> std::shared_ptr<arrow::csv::TableReader> LoadCSVData::CreateTableReader( const std::string & filePath )
>> {
>>      arrow::MemoryPool* pool = arrow::default_memory_pool();
>>      auto tableReader = arrow::csv::TableReader::Make( pool, OpenCSVFile( filePath ),
>>                                                        *PrepareReadOptions(), *PrepareParseOptions(), *PrepareConvertOptions() );
>>      if ( !tableReader.ok() )
>>      {
>>          throw BadParametersException( std::string( "CSV file reader error: " ) + tableReader.status().ToString() );
>>      }
>>      return *tableReader;
>> }
>>
>> Still memory is getting filled while calling ReadArrowTableFromCSV many times. Is the arrow's memory pool freed while destruction of TableReader? Or should I free it explicitly?
>>
>>
>> On 09.11.2020 15:01, Wes McKinney wrote:
>>
>> We'd prefer to answer questions on the mailing list or Jira (if
>> something looks like a bug).
>>
>> There isn't enough detail on the SO question to understand what other
>> things might be going on, but you are never destroying
>> this->mArrowTable which is holding on to allocated memory. If the
>> memory use keeps going up through repeated calls to the CSV reader
>> that sounds like a possible leak, so we would need to see more
>> details, including about your platform.
>>
>> On Mon, Nov 9, 2020 at 2:33 AM Maciej Skrzypkowski
>> <m....@gmx.com> wrote:
>>
>> Hi All!
>>
>> I don't understand memory management in C++ Arrow API. I have some
>> memory leaks while using it. I've created Stackoverflow question, maybe
>> someone would answer it:
>> https://stackoverflow.com/questions/64742588/how-to-manage-memory-while-reading-csv-using-apache-arrow-c-api
>> .
>>
>> Thanks,
>> Maciej Skrzypkowski
>>

Re: Arrow C++ API - memory management

Posted by Wes McKinney <we...@gmail.com>.
The memory should automatically be freed by any object / shared_ptr /
unique_ptr destruction. On Linux we use a background jemalloc thread
by default so it may not be freed immediately but it should not be
held indefinitely. In any case if you can reproduce the issue
consistently we'd be glad to take a look, please open a Jira issue and
provide as much information as you can to make it easy for us to
reproduce

On Mon, Nov 9, 2020 at 9:41 AM Maciej Skrzypkowski
<m....@gmx.com> wrote:
>
> OK, thanks for the answer.
>
> mArrowTable is "std::shared_ptr<arrow::Table> mArrowTable" so should be managed properly by the shared pointer. I've narrowed down the problem to code like this:
>
> void LoadCSVData::ReadArrowTableFromCSV( const std::string & filePath )
> {
>     auto tableReader = CreateTableReader( filePath );
>     //ReadArrowTableUsingReader( *tableReader );
> }
>
> std::shared_ptr<arrow::csv::TableReader> LoadCSVData::CreateTableReader( const std::string & filePath )
> {
>     arrow::MemoryPool* pool = arrow::default_memory_pool();
>     auto tableReader = arrow::csv::TableReader::Make( pool, OpenCSVFile( filePath ),
>                                                       *PrepareReadOptions(), *PrepareParseOptions(), *PrepareConvertOptions() );
>     if ( !tableReader.ok() )
>     {
>         throw BadParametersException( std::string( "CSV file reader error: " ) + tableReader.status().ToString() );
>     }
>     return *tableReader;
> }
>
> Still memory is getting filled while calling ReadArrowTableFromCSV many times. Is the arrow's memory pool freed while destruction of TableReader? Or should I free it explicitly?
>
>
> On 09.11.2020 15:01, Wes McKinney wrote:
>
> We'd prefer to answer questions on the mailing list or Jira (if
> something looks like a bug).
>
> There isn't enough detail on the SO question to understand what other
> things might be going on, but you are never destroying
> this->mArrowTable which is holding on to allocated memory. If the
> memory use keeps going up through repeated calls to the CSV reader
> that sounds like a possible leak, so we would need to see more
> details, including about your platform.
>
> On Mon, Nov 9, 2020 at 2:33 AM Maciej Skrzypkowski
> <m....@gmx.com> wrote:
>
> Hi All!
>
> I don't understand memory management in C++ Arrow API. I have some
> memory leaks while using it. I've created Stackoverflow question, maybe
> someone would answer it:
> https://stackoverflow.com/questions/64742588/how-to-manage-memory-while-reading-csv-using-apache-arrow-c-api
> .
>
> Thanks,
> Maciej Skrzypkowski
>

Re: Arrow C++ API - memory management

Posted by Maciej Skrzypkowski <m....@gmx.com>.
OK, thanks for the answer.

mArrowTable is "std::shared_ptr<arrow::Table> mArrowTable" so should be
managed properly by the shared pointer. I've narrowed down the problem
to code like this:

void LoadCSVData::ReadArrowTableFromCSV( const std::string & filePath )
{
     auto tableReader = CreateTableReader( filePath );
     //ReadArrowTableUsingReader( *tableReader );
}

std::shared_ptr<arrow::csv::TableReader> LoadCSVData::CreateTableReader(
const std::string & filePath )
{
     arrow::MemoryPool* pool = arrow::default_memory_pool();
     auto tableReader = arrow::csv::TableReader::Make( pool,
OpenCSVFile( filePath ),
*PrepareReadOptions(), *PrepareParseOptions(), *PrepareConvertOptions() );
     if ( !tableReader.ok() )
     {
         throw BadParametersException( std::string( "CSV file reader
error: " ) + tableReader.status().ToString() );
     }
     return *tableReader;
}

Still memory is getting filled while calling ReadArrowTableFromCSV many
times. Is the arrow's memory pool freed while destruction of
TableReader? Or should I free it explicitly?


On 09.11.2020 15:01, Wes McKinney wrote:
> We'd prefer to answer questions on the mailing list or Jira (if
> something looks like a bug).
>
> There isn't enough detail on the SO question to understand what other
> things might be going on, but you are never destroying
> this->mArrowTable which is holding on to allocated memory. If the
> memory use keeps going up through repeated calls to the CSV reader
> that sounds like a possible leak, so we would need to see more
> details, including about your platform.
>
> On Mon, Nov 9, 2020 at 2:33 AM Maciej Skrzypkowski
> <m....@gmx.com> wrote:
>> Hi All!
>>
>> I don't understand memory management in C++ Arrow API. I have some
>> memory leaks while using it. I've created Stackoverflow question, maybe
>> someone would answer it:
>> https://stackoverflow.com/questions/64742588/how-to-manage-memory-while-reading-csv-using-apache-arrow-c-api
>> .
>>
>> Thanks,
>> Maciej Skrzypkowski
>>

Re: Arrow C++ API - memory management

Posted by Wes McKinney <we...@gmail.com>.
We'd prefer to answer questions on the mailing list or Jira (if
something looks like a bug).

There isn't enough detail on the SO question to understand what other
things might be going on, but you are never destroying
this->mArrowTable which is holding on to allocated memory. If the
memory use keeps going up through repeated calls to the CSV reader
that sounds like a possible leak, so we would need to see more
details, including about your platform.

On Mon, Nov 9, 2020 at 2:33 AM Maciej Skrzypkowski
<m....@gmx.com> wrote:
>
> Hi All!
>
> I don't understand memory management in C++ Arrow API. I have some
> memory leaks while using it. I've created Stackoverflow question, maybe
> someone would answer it:
> https://stackoverflow.com/questions/64742588/how-to-manage-memory-while-reading-csv-using-apache-arrow-c-api
> .
>
> Thanks,
> Maciej Skrzypkowski
>