You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "yzr (Jira)" <ji...@apache.org> on 2020/11/19 02:50:00 UTC

[jira] [Created] (ARROW-10650) memory leak when read parquet file from hadoop

yzr created ARROW-10650:
---------------------------

             Summary: memory leak when read parquet file from hadoop
                 Key: ARROW-10650
                 URL: https://issues.apache.org/jira/browse/ARROW-10650
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
    Affects Versions: 2.0.0
         Environment: linux
            Reporter: yzr


when I use hdfs interface under arrow/io folder to access and read parquets files, it will lead memory leak. some of example codes blow:

 

_std::shared_ptr<arrow::io::HadoopFileSystem> hdfs = nullptr;_

_...//set hdfs conf var;_

_msg = arrow::io::HadoopFileSystem::Connect(&conf, &hdfs);_

_...//check is connect hdfs or not;_

_std::shared_ptr<arrow::io::HdfsReadableFile>_ _file_reader;_

_...//set file_path var;_

_arrow::Status ret = hdfs->OpenReadable(file_path, &file_reader);_

_...//check open success or not;_

_std::unique_ptr<parquet::ParquetFileReader>_ _parquet_reader;_

_parquet_reader = parquet::ParquetFileReader::Open(file_reader);_

_...//through_ _parquet_reader to get data;_

 

And I check when I read parquet file in local path,

_static std::unique_ptr<ParquetFileReader> OpenFile(_
 _const std::string& path, bool memory_map = true,_
 _const ReaderProperties& props = default_reader_properties(),_
 _std::shared_ptr<FileMetaData> metadata = NULLPTR);_

use OpenFile API, it is OK, will not lead memory leak;

 

is there any wrong step when I use hdfs API? or exist some issues. 

any quetions can reply me, thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)