You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/05 14:09:31 UTC

[GitHub] [arrow] zeroshade commented on a diff in pull request #13068: ARROW-16473: [Go] fixing memory leak in serializedPageReader

zeroshade commented on code in PR #13068:
URL: https://github.com/apache/arrow/pull/13068#discussion_r865950160


##########
go/parquet/file/page_reader.go:
##########
@@ -611,8 +610,6 @@ func (p *serializedPageReader) Next() bool {
 			// we don't know this page type, we're allowed to skip non-data pages
 			continue
 		}
-
-		p.buf = memory.NewResizableBuffer(p.mem)

Review Comment:
   Ah I see what was happening. I passed the raw bytes as a *new* buffer to the page itself. So technically even though the accounting in the memory allocator is now correct, if the allocator isn't a Go allocator this will actually end up using the buffer after it's been freed. if you look at line 596, it calls `memory.NewBufferBytes(data)` to pass the buffer to the datapage itself. `NewBufferBytes` constructs a buffer which wraps around a byte slice, but it doesn't attempt to *free* that slice when it is garbage collected and assumes that the Go garbage collector will handle it. 
   
   This is all fine for a byteslice that is allocated by Go itself, but since this is a buffer coming from the resizable buffer allocated using the `memory.Allocator` it's possible that this memory wasn't allocated by Go at all.
   
   Because data pages were always created using `NewBufferBytes` I don't believe I added a `Release` function to them. At this point I think the safest solution is to add a `Release` to the `DataPage` interface and corresponding structs, ensure it gets called by the page reader when it sets the current page to nil or otherwise, and just pass the buffer as is to the page rather than calling `NewBufferBytes`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org