You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/26 12:41:32 UTC
[GitHub] [arrow-rs] alamb opened a new issue #139: Segmentation fault in Arrow Parquet writer with huge arrays
alamb opened a new issue #139:
URL: https://github.com/apache/arrow-rs/issues/139
*Note*: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-10920
I stumbled across this by chance. I am not too surprised that this fails but I would expect it to fail gracefully and not with a segmentation fault.
{code:java}
use std::fs::File;
use std::sync::Arc;
use arrow::array::StringBuilder;
use arrow::datatypes::{DataType, Field, Schema};
use arrow::error::Result;
use arrow::record_batch::RecordBatch;
use parquet::arrow::ArrowWriter;
fn main() -> Result<()> {
let schema = Schema::new(vec![
Field::new("c0", DataType::Utf8, false),
Field::new("c1", DataType::Utf8, true),
]);
let batch_size = 2500000;
let repeat_count = 140;
let file = File::create("/tmp/test.parquet")?;
let mut writer = ArrowWriter::try_new(file, Arc::new(schema.clone()), None).unwrap();
let mut c0_builder = StringBuilder::new(batch_size);
let mut c1_builder = StringBuilder::new(batch_size);
println!("Start of loop");
for i in 0..batch_size {
let c0_value = format!("{:032}", i);
let c1_value = c0_value.repeat(repeat_count);
c0_builder.append_value(&c0_value)?;
c1_builder.append_value(&c1_value)?;
}
println!("Finish building c0");
let c0 = Arc::new(c0_builder.finish());
println!("Finish building c1");
let c1 = Arc::new(c1_builder.finish());
println!("Creating RecordBatch");
let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![c0, c1])?;
// write the batch to parquet
println!("Writing RecordBatch");
writer.write(&batch).unwrap();
println!("Closing writer");
writer.close().unwrap();
Ok(())
}
{code}
output:
{code:java}
Start of loop
Finish building c0
Finish building c1
Creating RecordBatch
Writing RecordBatch
Segmentation fault (core dumped)
{code}
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb edited a comment on issue #139: Segmentation fault in Arrow Parquet writer with huge arrays
Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #139:
URL: https://github.com/apache/arrow-rs/issues/139#issuecomment-954650617
When I run this test program I get an error (not super helpful but not a segfault either)
```
warning: `rust_parquet` (bin "rust_parquet") generated 2 warnings
Finished release [optimized + debuginfo] target(s) in 1m 22s
Running `/Volumes/RAMDisk/df-target/release/rust_parquet`
Start of loop
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /Users/alamb/Software/arrow-rs/arrow/src/array/builder.rs:925:71
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #139: Segmentation fault in Arrow Parquet writer with huge arrays
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #139:
URL: https://github.com/apache/arrow-rs/issues/139#issuecomment-954650617
When I run this test program I get an error (not super helpful but not a segfault eitehr)
```
warning: `rust_parquet` (bin "rust_parquet") generated 2 warnings
Finished release [optimized + debuginfo] target(s) in 1m 22s
Running `/Volumes/RAMDisk/df-target/release/rust_parquet`
Start of loop
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /Users/alamb/Software/arrow-rs/arrow/src/array/builder.rs:925:71
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org