You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "yzr (Jira)" <ji...@apache.org> on 2020/11/19 02:50:00 UTC
[jira] [Created] (ARROW-10650) memory leak when read parquet file
from hadoop
yzr created ARROW-10650:
---------------------------
Summary: memory leak when read parquet file from hadoop
Key: ARROW-10650
URL: https://issues.apache.org/jira/browse/ARROW-10650
Project: Apache Arrow
Issue Type: Bug
Components: C++
Affects Versions: 2.0.0
Environment: linux
Reporter: yzr
when I use hdfs interface under arrow/io folder to access and read parquets files, it will lead memory leak. some of example codes blow:
_std::shared_ptr<arrow::io::HadoopFileSystem> hdfs = nullptr;_
_...//set hdfs conf var;_
_msg = arrow::io::HadoopFileSystem::Connect(&conf, &hdfs);_
_...//check is connect hdfs or not;_
_std::shared_ptr<arrow::io::HdfsReadableFile>_ _file_reader;_
_...//set file_path var;_
_arrow::Status ret = hdfs->OpenReadable(file_path, &file_reader);_
_...//check open success or not;_
_std::unique_ptr<parquet::ParquetFileReader>_ _parquet_reader;_
_parquet_reader = parquet::ParquetFileReader::Open(file_reader);_
_...//through_ _parquet_reader to get data;_
And I check when I read parquet file in local path,
_static std::unique_ptr<ParquetFileReader> OpenFile(_
_const std::string& path, bool memory_map = true,_
_const ReaderProperties& props = default_reader_properties(),_
_std::shared_ptr<FileMetaData> metadata = NULLPTR);_
use OpenFile API, it is OK, will not lead memory leak;
is there any wrong step when I use hdfs API? or exist some issues.
any quetions can reply me, thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)