You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "candiduslynx (via GitHub)" <gi...@apache.org> on 2023/06/07 15:47:37 UTC

[GitHub] [arrow] candiduslynx opened a new pull request, #35970: GH-35015: [Go] Fix parquet memleak

candiduslynx opened a new pull request, #35970:
URL: https://github.com/apache/arrow/pull/35970

   ### Rationale for this change
   
   Some memory leaks resulted in partially skipped memory checks in pqarrow package.
   This PR brings the checks back.
   
   ### What changes are included in this PR?
   
   Releases in proper places.
   
   ### Are these changes tested?
   
   Yes, the tests from #35015 are fully enabled now.
   
   ### Are there any user-facing changes?
   
   No.
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221859608


##########
go/parquet/pqarrow/column_readers.go:
##########
@@ -117,12 +111,25 @@ func (lr *leafReader) LoadBatch(nrecords int64) (err error) {
 			}
 		}
 	}
-	lr.out, err = transferColumnData(lr.recordRdr, lr.field.Type, lr.descr, lr.rctx.mem)
+	lr.out, err = transferColumnData(lr.recordRdr, lr.field.Type, lr.descr)

Review Comment:
   mem param was unused



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on PR #35970:
URL: https://github.com/apache/arrow/pull/35970#issuecomment-1581171160

   closed in favor of #35971


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221846982


##########
go/arrow/memory/checked_allocator.go:
##########
@@ -160,14 +160,24 @@ func (a *CheckedAllocator) AssertSize(t TestingT, sz int) {
 				break
 			}
 			callersMsg.WriteString("\t")
-			callersMsg.WriteString(frame.Function)
-			callersMsg.WriteString(fmt.Sprintf(" line %d", frame.Line))
+			if fn := frame.Func; fn != nil {
+				callersMsg.WriteString(fmt.Sprintf("%s+%x", fn.Name(), frame.PC-fn.Entry()))
+			} else {
+				callersMsg.WriteString(fmt.Sprintf("%s, line %d", frame.Function, frame.Line))
+			}
+			callersMsg.WriteString("\n\t\t")
+			callersMsg.WriteString(frame.File + ":" + strconv.Itoa(frame.Line))

Review Comment:
   this gives a proper fill filename, so the leak is easier to find (IntelliJ IDEA even allows clicking on a link to line)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221842825


##########
go/arrow/memory/buffer.go:
##########
@@ -35,7 +35,7 @@ type Buffer struct {
 
 // NewBufferBytes creates a fixed-size buffer from the specified data.
 func NewBufferBytes(data []byte) *Buffer {
-	return &Buffer{refCount: 0, buf: data, length: len(data)}
+	return &Buffer{refCount: 1, buf: data, length: len(data)}

Review Comment:
   As otherwise we have a buffer with no references, IDK if that's intended



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221845678


##########
go/arrow/memory/buffer.go:
##########
@@ -35,7 +35,7 @@ type Buffer struct {
 
 // NewBufferBytes creates a fixed-size buffer from the specified data.
 func NewBufferBytes(data []byte) *Buffer {
-	return &Buffer{refCount: 0, buf: data, length: len(data)}
+	return &Buffer{refCount: 1, buf: data, length: len(data)}

Review Comment:
   for the housekeeping (mem & parent are nil still)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35970:
URL: https://github.com/apache/arrow/pull/35970#issuecomment-1581097540

   :warning: GitHub issue #35015 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221847214


##########
go/arrow/memory/checked_allocator.go:
##########
@@ -160,14 +160,24 @@ func (a *CheckedAllocator) AssertSize(t TestingT, sz int) {
 				break
 			}
 			callersMsg.WriteString("\t")
-			callersMsg.WriteString(frame.Function)
-			callersMsg.WriteString(fmt.Sprintf(" line %d", frame.Line))
+			if fn := frame.Func; fn != nil {
+				callersMsg.WriteString(fmt.Sprintf("%s+%x", fn.Name(), frame.PC-fn.Entry()))
+			} else {
+				callersMsg.WriteString(fmt.Sprintf("%s, line %d", frame.Function, frame.Line))
+			}
+			callersMsg.WriteString("\n\t\t")
+			callersMsg.WriteString(frame.File + ":" + strconv.Itoa(frame.Line))
 			callersMsg.WriteString("\n")
 			if !more {
 				break
 			}
 		}
-		t.Errorf("LEAK of %d bytes FROM %s line %d\n%v", info.sz, f.Name(), info.line, callersMsg.String())
+		file, line := f.FileLine(info.pc)
+		t.Errorf("LEAK of %d bytes FROM\n\t%s+%x\n\t\t%s:%d\n%v",
+			info.sz, f.Name(), info.pc-f.Entry(),
+			file, line,

Review Comment:
   this gives a proper fill filename, so the leak is easier to find (IntelliJ IDEA even allows clicking on a link to line)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221846196


##########
go/arrow/memory/checked_allocator.go:
##########
@@ -160,14 +160,24 @@ func (a *CheckedAllocator) AssertSize(t TestingT, sz int) {
 				break
 			}
 			callersMsg.WriteString("\t")
-			callersMsg.WriteString(frame.Function)
-			callersMsg.WriteString(fmt.Sprintf(" line %d", frame.Line))
+			if fn := frame.Func; fn != nil {
+				callersMsg.WriteString(fmt.Sprintf("%s+%x", fn.Name(), frame.PC-fn.Entry()))

Review Comment:
   offset in bytes from func start



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx closed pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx closed pull request #35970: GH-35015: [Go] Fix parquet memleak
URL: https://github.com/apache/arrow/pull/35970


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] candiduslynx commented on a diff in pull request #35970: GH-35015: [Go] Fix parquet memleak

Posted by "candiduslynx (via GitHub)" <gi...@apache.org>.
candiduslynx commented on code in PR #35970:
URL: https://github.com/apache/arrow/pull/35970#discussion_r1221859030


##########
go/parquet/file/record_reader.go:
##########
@@ -783,19 +783,19 @@ func newByteArrayRecordReader(descr *schema.Column, info LevelInfo, dtype arrow.
 	}}
 }
 
-func (fr *byteArrayRecordReader) ReserveValues(extra int64, hasNullable bool) error {
-	fr.bldr.Reserve(int(extra))
-	return fr.primitiveRecordReader.ReserveValues(extra, hasNullable)
+func (br *byteArrayRecordReader) ReserveValues(extra int64, hasNullable bool) error {

Review Comment:
   just renamed the receiver to have the same name across all methods



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org