You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ritchie (Jira)" <ji...@apache.org> on 2020/11/16 13:51:00 UTC
[jira] [Created] (ARROW-10618) Invalid write of size 1 in
StringBuilder
Ritchie created ARROW-10618:
-------------------------------
Summary: Invalid write of size 1 in StringBuilder
Key: ARROW-10618
URL: https://issues.apache.org/jira/browse/ARROW-10618
Project: Apache Arrow
Issue Type: Bug
Reporter: Ritchie
h1. What is the problem?
I encounter memory errors with using the safe api of StringBuilder. I used the exact same code with PrimitiveBuilders and don't encounter the issue with them.
h1. How to reproduce?
I encounter when creating multiple builder whilst reading a very large csv. The csv I've used is this kaggle dataset: https://www.kaggle.com/colinmorris/reddit-usernames
{code:c++}
use arrow::array::StringBuilder;
fn main () {
let batch_size = 1024;
let file = std::fs::File::open("users.csv").unwrap();
let mut rdr = csv::Reader::from_reader(file);
// to make sure exceeding limit is not the cause of invalid mem write
let mut builder = StringBuilder::new(batch_size * 2);
for (i, result) in rdr.records().enumerate() {
let record = result.unwrap();
builder.append_value(record.get(0).unwrap()).unwrap();
if i % batch_size == 0 {
builder.finish();
builder = StringBuilder::new(batch_size * 2)
}
}
}
{code}
h2. Cargo.toml
{code:c}
[package]
name = "memcheck"
version = "0.1.0"
authors = ["ritchie46 <ri...@gmail.com>"]
edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
arrow = {version = "2", default_features = false}
csv = "1.1"
{code}
h1. Valgrind output
{code:c}
==11917== Memcheck, a memory error detector
==11917== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==11917== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==11917== Command: ./memcheck
==11917==
==11917== Invalid read of size 1
==11917== at 0x13C022: arrow::util::bit_util::set_bits_raw (bit_util.rs:128)
==11917== by 0x150B71: <arrow::array::builder::BufferBuilder<arrow::datatypes::BooleanType> as arrow::array::builder::BufferBuilderTrait<arrow::datatypes::BooleanType>>::append_n (builder.rs:374)
==11917== by 0x151164: arrow::array::builder::PrimitiveBuilder<T>::append_slice (builder.rs:596)
==11917== by 0x152417: arrow::array::builder::StringBuilder::append_value (builder.rs:1771)
==11917== by 0x12D0ED: memcheck::main (exec.rs:773)
==11917== by 0x12C3CA: core::ops::function::FnOnce::call_once (dfa.rs:794)
==11917== by 0x12CACD: std::sys_common::backtrace::__rust_begin_short_backtrace (dfa.rs:840)
==11917== by 0x1307E0: std::rt::lang_start::{{closure}} (rt.rs:66)
==11917== by 0x25F746: call_once<(),Fn<()>> (function.rs:259)
==11917== by 0x25F746: do_call<&Fn<()>,i32> (panicking.rs:381)
==11917== by 0x25F746: try<i32,&Fn<()>> (panicking.rs:345)
==11917== by 0x25F746: catch_unwind<&Fn<()>,i32> (panic.rs:396)
==11917== by 0x25F746: std::rt::lang_start_internal (rt.rs:51)
==11917== by 0x1307B6: std::rt::lang_start (rt.rs:65)
==11917== by 0x12D369: main (in /home/ritchie46/code/polars/target/debug/memcheck)
==11917== Address 0x5f01e80 is 0 bytes after a block of size 1,024 alloc'd
==11917== at 0x4C31E76: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11917== by 0x4C31F91: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11917== by 0x25E453: aligned_malloc (alloc.rs:95)
==11917== by 0x25E453: alloc (alloc.rs:22)
==11917== by 0x25E453: realloc_fallback (alloc.rs:41)
==11917== by 0x25E453: realloc (alloc.rs:50)
==11917== by 0x25E453: __rdl_realloc (alloc.rs:378)
==11917== by 0x136D5C: alloc::alloc::realloc (alloc.rs:120)
==11917== by 0x13B811: arrow::memory::reallocate (memory.rs:187)
==11917== by 0x188A54: arrow::buffer::MutableBuffer::reserve (buffer.rs:666)
==11917== by 0x150CBD: <arrow::array::builder::BufferBuilder<arrow::datatypes::BooleanType> as arrow::array::builder::BufferBuilderTrait<arrow::datatypes::BooleanType>>::reserve (builder.rs:402)
==11917== by 0x150A1E: <arrow::array::builder::BufferBuilder<arrow::datatypes::BooleanType> as arrow::array::builder::BufferBuilderTrait<arrow::datatypes::BooleanType>>::append_n (builder.rs:371)
==11917== by 0x151164: arrow::array::builder::PrimitiveBuilder<T>::append_slice (builder.rs:596)
==11917== by 0x152417: arrow::array::builder::StringBuilder::append_value (builder.rs:1771)
==11917== by 0x12D0ED: memcheck::main (exec.rs:773)
==11917== by 0x12C3CA: core::ops::function::FnOnce::call_once (dfa.rs:794)
==11917==
==11917== Invalid write of size 1
==11917== at 0x13C024: arrow::util::bit_util::set_bits_raw (bit_util.rs:128)
==11917== by 0x150B71: <arrow::array::builder::BufferBuilder<arrow::datatypes::BooleanType> as arrow::array::builder::BufferBuilderTrait<arrow::datatypes::BooleanType>>::append_n (builder.rs:374)
==11917== by 0x151164: arrow::array::builder::PrimitiveBuilder<T>::append_slice (builder.rs:596)
==11917== by 0x152417: arrow::array::builder::StringBuilder::append_value (builder.rs:1771)
==11917== by 0x12D0ED: memcheck::main (exec.rs:773)
==11917== by 0x12C3CA: core::ops::function::FnOnce::call_once (dfa.rs:794)
==11917== by 0x12CACD: std::sys_common::backtrace::__rust_begin_short_backtrace (dfa.rs:840)
==11917== by 0x1307E0: std::rt::lang_start::{{closure}} (rt.rs:66)
==11917== by 0x25F746: call_once<(),Fn<()>> (function.rs:259)
==11917== by 0x25F746: do_call<&Fn<()>,i32> (panicking.rs:381)
==11917== by 0x25F746: try<i32,&Fn<()>> (panicking.rs:345)
==11917== by 0x25F746: catch_unwind<&Fn<()>,i32> (panic.rs:396)
==11917== by 0x25F746: std::rt::lang_start_internal (rt.rs:51)
==11917== by 0x1307B6: std::rt::lang_start (rt.rs:65)
==11917== by 0x12D369: main (in /home/ritchie46/code/polars/target/debug/memcheck)
==11917== Address 0x5f01e80 is 0 bytes after a block of size 1,024 alloc'd
==11917== at 0x4C31E76: memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11917== by 0x4C31F91: posix_memalign (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==11917== by 0x25E453: aligned_malloc (alloc.rs:95)
==11917== by 0x25E453: alloc (alloc.rs:22)
==11917== by 0x25E453: realloc_fallback (alloc.rs:41)
==11917== by 0x25E453: realloc (alloc.rs:50)
==11917== by 0x25E453: __rdl_realloc (alloc.rs:378)
==11917== by 0x136D5C: alloc::alloc::realloc (alloc.rs:120)
==11917== by 0x13B811: arrow::memory::reallocate (memory.rs:187)
==11917== by 0x188A54: arrow::buffer::MutableBuffer::reserve (buffer.rs:666)
==11917== by 0x150CBD: <arrow::array::builder::BufferBuilder<arrow::datatypes::BooleanType> as arrow::array::builder::BufferBuilderTrait<arrow::datatypes::BooleanType>>::reserve (builder.rs:402)
==11917== by 0x150A1E: <arrow::array::builder::BufferBuilder<arrow::datatypes::BooleanType> as arrow::array::builder::BufferBuilderTrait<arrow::datatypes::BooleanType>>::append_n (builder.rs:371)
==11917== by 0x151164: arrow::array::builder::PrimitiveBuilder<T>::append_slice (builder.rs:596)
==11917== by 0x152417: arrow::array::builder::StringBuilder::append_value (builder.rs:1771)
==11917== by 0x12D0ED: memcheck::main (exec.rs:773)
==11917== by 0x12C3CA: core::ops::function::FnOnce::call_once (dfa.rs:794)
{code}
h1. Environment.
Confirmed invalid write on Ubuntu 18.04 and a Segfault 11 on MacOs.
cargo 1.49.0-nightly (d5556aeb8 2020-11-04)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)