You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ahmed Riza (Jira)" <ji...@apache.org> on 2021/02/12 23:03:00 UTC
[jira] [Comment Edited] (ARROW-6154) [Rust] [Parquet] Too many open
files (os error 24)
[ https://issues.apache.org/jira/browse/ARROW-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284027#comment-17284027 ]
Ahmed Riza edited comment on ARROW-6154 at 2/12/21, 11:02 PM:
--------------------------------------------------------------
I've come across the same issue. It appears to be in [https://github.com/apache/arrow/blob/master/rust/parquet/src/util/io.rs#L82.] In my case I have a Parquet file with 3000 columns, and the `try_clone` call here eventually fails as there are too many file handles open.
Here's a stack trace from `gdb` which leads to the call in `io.rs`:
{code:java}
#0 parquet::util::io::FileSource<std::fs::File>::new<std::fs::File> (fd=0x7ffff7c3fafc, start=807191, length=65536) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/util/io.rs:82
#1 0x00005555558294ce in parquet::file::serialized_reader::{{impl}}::get_read (self=0x7ffff7c3fafc, start=807191, length=65536)
at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:59
#2 0x000055555590a3fc in parquet::file::footer::parse_metadata<std::fs::File> (chunk_reader=0x7ffff7c3fafc) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/footer.rs:57
#3 0x0000555555845db1 in parquet::file::serialized_reader::SerializedFileReader<std::fs::File>::new<std::fs::File> (chunk_reader=...)
at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:134
#4 0x0000555555845bb6 in parquet::file::serialized_reader::{{impl}}::try_from (file=...) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:81
#5 0x0000555555845c4a in parquet::file::serialized_reader::{{impl}}::try_from (path=0x7ffff0000d20) at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:90
#6 0x0000555555845d34 in parquet::file::serialized_reader::{{impl}}::try_from (path="resources/portfolio.parquet/part-00001-33e6c49b-d6cb-4175-bc41-7198fd777d3a-c000.snappy.parquet")
at /home/a/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-3.0.0/src/file/serialized_reader.rs:98
#7 0x000055555577c7f5 in data_rust::parquet::parquet_demo::test::test_read_multiple_files () at /work/rust/data-rust/src/parquet/parquet_demo.rs:103
{code}
was (Author: dr.riza@gmail.com):
I've come across the same issue. It appears to be in [https://github.com/apache/arrow/blob/master/rust/parquet/src/util/io.rs#L82.] In my case I have a Parquet file with 3000 columns, and the `try_clone` call here eventually fails as there are too many file handles open.
> [Rust] [Parquet] Too many open files (os error 24)
> --------------------------------------------------
>
> Key: ARROW-6154
> URL: https://issues.apache.org/jira/browse/ARROW-6154
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Yesh
> Priority: Major
>
> Used [rust]*parquet-read binary to read a deeply nested parquet file and see the below stack trace. Unfortunately won't be able to upload file.*
> {code:java}
> stack backtrace:
> 0: std::panicking::default_hook::{{closure}}
> 1: std::panicking::default_hook
> 2: std::panicking::rust_panic_with_hook
> 3: std::panicking::continue_panic_fmt
> 4: rust_begin_unwind
> 5: core::panicking::panic_fmt
> 6: core::result::unwrap_failed
> 7: parquet::util::io::FileSource<R>::new
> 8: <parquet::file::reader::SerializedRowGroupReader<R> as parquet::file::reader::RowGroupReader>::get_column_page_reader
> 9: <parquet::file::reader::SerializedRowGroupReader<R> as parquet::file::reader::RowGroupReader>::get_column_reader
> 10: parquet::record::reader::TreeBuilder::reader_tree
> 11: parquet::record::reader::TreeBuilder::reader_tree
> 12: parquet::record::reader::TreeBuilder::reader_tree
> 13: parquet::record::reader::TreeBuilder::reader_tree
> 14: parquet::record::reader::TreeBuilder::reader_tree
> 15: parquet::record::reader::TreeBuilder::build
> 16: <parquet::record::reader::RowIter as core::iter::traits::iterator::Iterator>::next
> 17: parquet_read::main
> 18: std::rt::lang_start::{{closure}}
> 19: std::panicking::try::do_call
> 20: __rust_maybe_catch_panic
> 21: std::rt::lang_start_internal
> 22: main{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)