You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neville Dipale (Jira)" <ji...@apache.org> on 2021/01/10 06:39:00 UTC
[jira] [Updated] (ARROW-10920) [Rust] Segmentation fault in Arrow
Parquet writer with huge arrays
[ https://issues.apache.org/jira/browse/ARROW-10920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neville Dipale updated ARROW-10920:
-----------------------------------
Fix Version/s: 4.0.0
> [Rust] Segmentation fault in Arrow Parquet writer with huge arrays
> ------------------------------------------------------------------
>
> Key: ARROW-10920
> URL: https://issues.apache.org/jira/browse/ARROW-10920
> Project: Apache Arrow
> Issue Type: Bug
> Components: Rust
> Reporter: Andy Grove
> Priority: Major
> Fix For: 4.0.0
>
>
> I stumbled across this by chance. I am not too surprised that this fails but I would expect it to fail gracefully and not with a segmentation fault.
>
> {code:java}
> use std::fs::File;
> use std::sync::Arc;
> use arrow::array::StringBuilder;
> use arrow::datatypes::{DataType, Field, Schema};
> use arrow::error::Result;
> use arrow::record_batch::RecordBatch;
> use parquet::arrow::ArrowWriter;
> fn main() -> Result<()> {
> let schema = Schema::new(vec![
> Field::new("c0", DataType::Utf8, false),
> Field::new("c1", DataType::Utf8, true),
> ]);
> let batch_size = 2500000;
> let repeat_count = 140;
> let file = File::create("/tmp/test.parquet")?;
> let mut writer = ArrowWriter::try_new(file, Arc::new(schema.clone()), None).unwrap();
> let mut c0_builder = StringBuilder::new(batch_size);
> let mut c1_builder = StringBuilder::new(batch_size);
> println!("Start of loop");
> for i in 0..batch_size {
> let c0_value = format!("{:032}", i);
> let c1_value = c0_value.repeat(repeat_count);
> c0_builder.append_value(&c0_value)?;
> c1_builder.append_value(&c1_value)?;
> }
> println!("Finish building c0");
> let c0 = Arc::new(c0_builder.finish());
> println!("Finish building c1");
> let c1 = Arc::new(c1_builder.finish());
> println!("Creating RecordBatch");
> let batch = RecordBatch::try_new(Arc::new(schema.clone()), vec![c0, c1])?;
> // write the batch to parquet
> println!("Writing RecordBatch");
> writer.write(&batch).unwrap();
> println!("Closing writer");
> writer.close().unwrap();
> Ok(())
> }
> {code}
> output:
> {code:java}
> Start of loop
> Finish building c0
> Finish building c1
> Creating RecordBatch
> Writing RecordBatch
> Segmentation fault (core dumped)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)