You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Sasha Sirovica (Jira)" <ji...@apache.org> on 2022/08/19 05:31:00 UTC

[jira] [Created] (ARROW-17473) [Go] String Binary Builder Leaks Memory When Writing to Parquet

Sasha Sirovica created ARROW-17473:
--------------------------------------

             Summary: [Go] String Binary Builder Leaks Memory When Writing to Parquet
                 Key: ARROW-17473
                 URL: https://issues.apache.org/jira/browse/ARROW-17473
             Project: Apache Arrow
          Issue Type: Bug
          Components: Go
    Affects Versions: 9.0.0
         Environment: Mac
            Reporter: Sasha Sirovica


When using `arrow.BinaryTypes.String` in a schema, appending multiple strings, and then writing a record out to parquet the memory of the program continuously increases.

 

I took a heap dump on my computer midway through the program and the majority of allocations comes from `StringBuilder.Append`. I approached 16GB of RAM before terminating the program.

 

I was not able to replicate this behavior with just PrimativeTypes. Another interesting point, if the records are created but never written with pqarrow there are also no memory leaks. In the below program commenting out `w.Write(rec)` will not cause memory issues.

 

Example program which causes memory to leak:
{code:java}
package main

import (
   "os"
   "testing"

   "github.com/apache/arrow/go/v9/arrow"
   "github.com/apache/arrow/go/v9/arrow/array"
   "github.com/apache/arrow/go/v9/arrow/memory"
   "github.com/apache/arrow/go/v9/parquet"
   "github.com/apache/arrow/go/v9/parquet/compress"
   "github.com/apache/arrow/go/v9/parquet/pqarrow"
)

func main() {
   f, _ := os.Create("/tmp/test.parquet")

   arrowProps := pqarrow.DefaultWriterProps()
   schema := arrow.NewSchema(
      []arrow.Field{
         {Name: "aString", Type: arrow.BinaryTypes.String},
      },
      nil,
   )
   w, _ := pqarrow.NewFileWriter(schema, f, parquet.NewWriterProperties(parquet.WithCompression(compress.Codecs.Snappy)), arrowProps)

   builder := array.NewRecordBuilder(memory.DefaultAllocator, schema)
   for i := 1; i < 50000000; i++ {
      builder.Field(0).(*array.StringBuilder).Append("HelloWorld!")
      if i%2000000 == 0 {
         // Write row groups out every 2M times
         rec := builder.NewRecord()
         w.Write(rec)
         rec.Release()
      }
   }
   w.Close()
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)