You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by uw...@apache.org on 2018/12/05 18:06:52 UTC
[arrow] branch master updated: ARROW-3929: [Go] improve CSV reader
memory usage
This is an automated email from the ASF dual-hosted git repository.
uwe pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 0d0ff75 ARROW-3929: [Go] improve CSV reader memory usage
0d0ff75 is described below
commit 0d0ff7521b2212800a9aeec0c945bf3f4a402f68
Author: Sebastien Binet <bi...@cern.ch>
AuthorDate: Wed Dec 5 19:06:43 2018 +0100
ARROW-3929: [Go] improve CSV reader memory usage
This CL enables the encoding/csv reader to reuse the memory used by
records, from row to row, and thus reduce memory pressure on Go's GC.
```
$> benchstat old.txt new.txt
name old time/op new time/op delta
Read/rows=10_cols=1_chunks=10-8 39.4µs ±18% 42.6µs ±24% ~ (p=0.218 n=10+10)
Read/rows=10_cols=10_chunks=10-8 293µs ±23% 280µs ±24% ~ (p=0.400 n=10+9)
Read/rows=10_cols=100_chunks=10-8 2.72ms ±24% 2.56ms ±20% ~ (p=0.353 n=10+10)
Read/rows=10_cols=1000_chunks=10-8 24.3ms ± 2% 24.0ms ± 3% ~ (p=0.059 n=8+9)
Read/rows=100_cols=1_chunks=10-8 74.9µs ±11% 62.1µs ±19% -17.21% (p=0.004 n=10+10)
Read/rows=100_cols=10_chunks=10-8 559µs ±21% 474µs ±21% -15.12% (p=0.009 n=10+10)
Read/rows=100_cols=100_chunks=10-8 5.53ms ±21% 4.36ms ±16% -21.27% (p=0.000 n=10+9)
Read/rows=100_cols=1000_chunks=10-8 41.9ms ± 3% 42.2ms ±13% ~ (p=0.684 n=10+10)
Read/rows=1000_cols=1_chunks=10-8 421µs ±13% 320µs ±10% -23.98% (p=0.000 n=10+10)
Read/rows=1000_cols=10_chunks=10-8 3.24ms ±24% 2.63ms ±15% -18.77% (p=0.007 n=10+10)
Read/rows=1000_cols=100_chunks=10-8 33.0ms ±17% 27.0ms ±19% -18.09% (p=0.001 n=10+10)
Read/rows=1000_cols=1000_chunks=10-8 219ms ± 1% 211ms ± 2% -3.81% (p=0.000 n=9+10)
Read/rows=10000_cols=1_chunks=10-8 3.66ms ±11% 2.91ms ±10% -20.27% (p=0.000 n=10+10)
Read/rows=10000_cols=10_chunks=10-8 31.8ms ±16% 25.6ms ±15% -19.66% (p=0.000 n=10+10)
Read/rows=10000_cols=100_chunks=10-8 192ms ± 1% 182ms ± 1% -5.19% (p=0.000 n=10+10)
Read/rows=10000_cols=1000_chunks=10-8 1.99s ± 1% 1.93s ± 2% -3.26% (p=0.000 n=9+9)
Read/rows=100000_cols=1_chunks=10-8 32.9ms ± 4% 26.1ms ± 4% -20.75% (p=0.000 n=10+10)
Read/rows=100000_cols=10_chunks=10-8 203ms ± 1% 198ms ± 7% ~ (p=0.123 n=10+10)
Read/rows=100000_cols=100_chunks=10-8 2.00s ± 1% 1.92s ± 1% -4.24% (p=0.000 n=10+8)
Read/rows=100000_cols=1000_chunks=10-8 22.7s ± 2% 22.0s ± 2% -3.31% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
Read/rows=10_cols=1_chunks=10-8 32.7kB ± 0% 32.2kB ± 0% -1.32% (p=0.000 n=10+10)
Read/rows=10_cols=10_chunks=10-8 281kB ± 0% 277kB ± 0% -1.54% (p=0.000 n=10+10)
Read/rows=10_cols=100_chunks=10-8 2.77MB ± 0% 2.73MB ± 0% -1.58% (p=0.000 n=10+10)
Read/rows=10_cols=1000_chunks=10-8 27.8MB ± 0% 27.3MB ± 0% -1.59% (p=0.000 n=9+9)
Read/rows=100_cols=1_chunks=10-8 44.0kB ± 0% 39.3kB ± 0% -10.80% (p=0.000 n=10+10)
Read/rows=100_cols=10_chunks=10-8 381kB ± 0% 333kB ± 0% -12.48% (p=0.000 n=10+10)
Read/rows=100_cols=100_chunks=10-8 3.78MB ± 0% 3.29MB ± 0% -12.75% (p=0.000 n=10+10)
Read/rows=100_cols=1000_chunks=10-8 37.9MB ± 0% 33.1MB ± 0% -12.83% (p=0.000 n=10+9)
Read/rows=1000_cols=1_chunks=10-8 200kB ± 0% 152kB ± 0% -23.99% (p=0.000 n=10+10)
Read/rows=1000_cols=10_chunks=10-8 1.84MB ± 0% 1.36MB ± 0% -26.08% (p=0.000 n=10+9)
Read/rows=1000_cols=100_chunks=10-8 18.4MB ± 0% 13.5MB ± 0% -26.44% (p=0.000 n=9+10)
Read/rows=1000_cols=1000_chunks=10-8 184MB ± 0% 135MB ± 0% -26.62% (p=0.000 n=10+10)
Read/rows=10000_cols=1_chunks=10-8 1.65MB ± 0% 1.17MB ± 0% -29.02% (p=0.000 n=10+10)
Read/rows=10000_cols=10_chunks=10-8 15.7MB ± 0% 10.9MB ± 0% -30.65% (p=0.000 n=10+10)
Read/rows=10000_cols=100_chunks=10-8 156MB ± 0% 108MB ± 0% -31.12% (p=0.000 n=10+8)
Read/rows=10000_cols=1000_chunks=10-8 1.58GB ± 0% 1.09GB ± 0% -31.06% (p=0.000 n=10+10)
Read/rows=100000_cols=1_chunks=10-8 20.1MB ± 0% 15.3MB ± 0% -23.93% (p=0.000 n=10+9)
Read/rows=100000_cols=10_chunks=10-8 197MB ± 0% 149MB ± 0% -24.39% (p=0.000 n=10+8)
Read/rows=100000_cols=100_chunks=10-8 1.96GB ± 0% 1.47GB ± 0% -24.86% (p=0.000 n=10+10)
Read/rows=100000_cols=1000_chunks=10-8 19.7GB ± 0% 14.7GB ± 0% -25.00% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Read/rows=10_cols=1_chunks=10-8 319 ± 0% 310 ± 0% -2.82% (p=0.000 n=10+10)
Read/rows=10_cols=10_chunks=10-8 2.63k ± 0% 2.62k ± 0% -0.34% (p=0.000 n=10+10)
Read/rows=10_cols=100_chunks=10-8 25.7k ± 0% 25.7k ± 0% -0.04% (p=0.000 n=10+10)
Read/rows=10_cols=1000_chunks=10-8 256k ± 0% 256k ± 0% -0.00% (p=0.000 n=10+10)
Read/rows=100_cols=1_chunks=10-8 524 ± 0% 425 ± 0% -18.89% (p=0.000 n=10+10)
Read/rows=100_cols=10_chunks=10-8 3.02k ± 0% 2.92k ± 0% -3.27% (p=0.000 n=10+10)
Read/rows=100_cols=100_chunks=10-8 28.0k ± 0% 27.9k ± 0% -0.35% (p=0.000 n=10+10)
Read/rows=100_cols=1000_chunks=10-8 277k ± 0% 277k ± 0% -0.04% (p=0.000 n=10+10)
Read/rows=1000_cols=1_chunks=10-8 2.43k ± 0% 1.44k ± 0% -41.04% (p=0.000 n=10+10)
Read/rows=1000_cols=10_chunks=10-8 5.92k ± 0% 4.92k ± 0% -16.87% (p=0.000 n=10+10)
Read/rows=1000_cols=100_chunks=10-8 40.8k ± 0% 39.8k ± 0% -2.45% (p=0.000 n=10+10)
Read/rows=1000_cols=1000_chunks=10-8 389k ± 0% 388k ± 0% -0.26% (p=0.000 n=10+10)
Read/rows=10000_cols=1_chunks=10-8 20.6k ± 0% 10.6k ± 0% -48.58% (p=0.000 n=10+10)
Read/rows=10000_cols=10_chunks=10-8 25.4k ± 0% 15.4k ± 0% -39.33% (p=0.000 n=10+10)
Read/rows=10000_cols=100_chunks=10-8 73.8k ± 0% 63.8k ± 0% -13.56% (p=0.000 n=10+10)
Read/rows=10000_cols=1000_chunks=10-8 557k ± 0% 547k ± 0% -1.79% (p=0.000 n=10+10)
Read/rows=100000_cols=1_chunks=10-8 201k ± 0% 101k ± 0% -49.78% (p=0.000 n=10+10)
Read/rows=100000_cols=10_chunks=10-8 208k ± 0% 108k ± 0% -48.02% (p=0.000 n=10+10)
Read/rows=100000_cols=100_chunks=10-8 282k ± 0% 182k ± 0% -35.49% (p=0.000 n=10+10)
Read/rows=100000_cols=1000_chunks=10-8 1.02M ± 0% 0.92M ± 0% -9.83% (p=0.000 n=10+10)
```
Author: Sebastien Binet <bi...@cern.ch>
Closes #3073 from sbinet/issue-3929 and squashes the following commits:
67a3272c <Sebastien Binet> ARROW-3929: improve CSV reader memory usage
8eb60c52 <Sebastien Binet> ARROW-3681: Add benchmarks for CSV reader
---
go/arrow/csv/csv.go | 1 +
1 file changed, 1 insertion(+)
diff --git a/go/arrow/csv/csv.go b/go/arrow/csv/csv.go
index 79f2280..022c46d 100644
--- a/go/arrow/csv/csv.go
+++ b/go/arrow/csv/csv.go
@@ -98,6 +98,7 @@ func NewReader(r io.Reader, schema *arrow.Schema, opts ...Option) *Reader {
validate(schema)
rr := &Reader{r: csv.NewReader(r), schema: schema, refs: 1, chunk: 1}
+ rr.r.ReuseRecord = true
for _, opt := range opts {
opt(rr)
}