You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/14 10:26:40 UTC

[GitHub] [arrow-rs] rockyzhengwu opened a new issue, #2875: ArrowWriter memory leak

rockyzhengwu opened a new issue, #2875:
URL: https://github.com/apache/arrow-rs/issues/2875

   **Describe the bug**
   We use ArrowWriter but found memory leak. I wrote a sample code . profile with bytehound
   ```
   Total: 1
   Leaked: 1
   Backtrace:
   #00 [mem-leak] _start
   #01 [libc.so.6] __libc_start_main
   #02 [libc.so.6] 7f7666c98d8f
   #03 [mem-leak] main
   #16 [mem-leak] mem_leak::main [main.rs:15]
   #17 [mem-leak] parquet::arrow::arrow_writer::ArrowWriter<W>::close [mod.rs:234]
   #18 [mem-leak] parquet::arrow::arrow_writer::ArrowWriter<W>::flush [mod.rs:161]
   #19 [mem-leak] parquet::arrow::arrow_writer::ArrowWriter<W>::flush_rows [mod.rs:217]
   #20 [mem-leak] parquet::arrow::arrow_writer::write_leaves [mod.rs:273]
   #21 [mem-leak] parquet::file::writer::SerializedRowGroupWriter<W>::next_column [writer.rs:427]
   #22 [mem-leak] parquet::file::writer::SerializedRowGroupWriter<W>::next_column_with_factory [writer.rs:415]
   #23 [mem-leak] parquet::file::writer::SerializedRowGroupWriter<W>::next_column::{{closure}} [writer.rs:428]
   #24 [mem-leak] parquet::column::writer::get_column_writer [mod.rs:78]
   #25 [mem-leak] parquet::column::writer::GenericColumnWriter<E>::new [mod.rs:225]
   #26 [mem-leak] <parquet::column::writer::encoder::ColumnValueEncoderImpl<T> as parquet::column::writer::encoder::ColumnValueEncoder>::try_new [encoder.rs:167]
   #27 [mem-leak] core::bool::<impl bool>::then [bool.rs:71]
   #28 [mem-leak] <parquet::column::writer::encoder::ColumnValueEncoderImpl<T> as parquet::column::writer::encoder::ColumnValueEncoder>::try_new::{{closure}} [encoder.rs:167]
   #29 [mem-leak] parquet::encodings::encoding::dict_encoder::DictEncoder<T>::new [dict_encoder.rs:92]
   #30 [mem-leak] parquet::util::interner::Interner<S>::new [interner.rs:56]
   #31 [mem-leak] <ahash::random_state::RandomState as core::default::Default>::default [interner.rs:56]
   #32 [mem-leak] ahash::random_state::RandomState::new [random_state.rs:216]
   #33 [mem-leak] ahash::random_state::get_fixed_seeds [random_state.rs:216]
   #34 [mem-leak] once_cell::race::once_box::OnceBox<T>::get_or_init [race.rs:256]
   #35 [mem-leak] once_cell::race::once_box::OnceBox<T>::get_or_try_init [race.rs:276]
   #36 [mem-leak] once_cell::race::once_box::OnceBox<T>::get_or_init::{{closure}} [race.rs:256]
   #37 [mem-leak] ahash::random_state::get_fixed_seeds::{{closure}} [random_state.rs:78]
   #38 [mem-leak] alloc::boxed::Box<T>::new [random_state.rs:78]
   #39 [mem-leak] alloc::alloc::exchange_malloc [alloc.rs:330]
   #40 [mem-leak] <alloc::alloc::Global as core::alloc::Allocator>::allocate [alloc.rs:330]
   #41 [mem-leak] alloc::alloc::Global::alloc_impl [alloc.rs:181]
   #42 [mem-leak] alloc::alloc::alloc [alloc.rs:181]
   ```
   
   use 
   
   **To Reproduce**
   simple code 
   ``` rust
   use arrow::array::Int64Array;
   use arrow::array::ArrayRef;
   use arrow::record_batch::RecordBatch;
   use std::sync::Arc;
   use parquet::arrow::ArrowWriter;
   
   
   fn main() {
       let col = Arc::new(Int64Array::from_iter_values([1, 2, 3])) as ArrayRef;
       let to_write = RecordBatch::try_from_iter([("col", col)]).unwrap();
   
       let mut buffer = Vec::new();
       let mut writer = ArrowWriter::try_new(&mut buffer, to_write.schema(), None).unwrap();
       writer.write(&to_write).unwrap();
       writer.close().unwrap();
   }
   
   ```
   Rust compiler is build by myself with `debuginfo-level = 1`  config. 
   
   compile example code with 
   ```
   [profile.release]
   debug = 1
   ```
   
   bytehound script 
   ```
   let groups = allocations()
       .only_leaked()
       .group_by_backtrace()
           .sort_by_size();
   
   graph().add(groups).save();
   fn analyze_group(list) {
       let list_all = allocations().only_matching_backtraces(list);
   
       graph()
           .add("Leaked", list_all.only_leaked())
           .add("Temporary", list_all)
           .save();
   
       println("Total: {}", list_all.len());
       println("Leaked: {}", list_all.only_leaked().len());
       println();
       println("Backtrace:");
       println(list_all[0].backtrace().strip());
   }
   analyze_group(groups[0]);
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] rockyzhengwu closed issue #2875: AHash Statically Allocates 64 bytes

Posted by GitBox <gi...@apache.org>.
rockyzhengwu closed issue #2875: AHash Statically Allocates 64 bytes
URL: https://github.com/apache/arrow-rs/issues/2875


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] rockyzhengwu commented on issue #2875: AHash Statically Allocates 64 bytes

Posted by GitBox <gi...@apache.org>.
rockyzhengwu commented on issue #2875:
URL: https://github.com/apache/arrow-rs/issues/2875#issuecomment-1279610447

   @tustvold Thanks your reply, there isn't an issue.  just bytehound considering this a memory leak


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #2875: ArrowWriter memory leak

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2875:
URL: https://github.com/apache/arrow-rs/issues/2875#issuecomment-1279461448

   Following the stack trace this would appear to be a static initialization within `ahash`. The nature of OnceCell is that it is initialized once and then reused indefinitely, and I would not expect it to ever be explicitly freed. Is bytehound considering this a memory leak?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org