You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/11 19:56:12 UTC

[GitHub] [beam] damccorm opened a new pull request, #22225: Collect heap dump on OOM

damccorm opened a new pull request, #22225:
URL: https://github.com/apache/beam/pull/22225

   **Please** add a meaningful description for your change here
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r920007997


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {
+	heapDumpLoc := "heapDump"
+
+	f, err := os.Create(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	debug.WriteHeapDump(f.Fd())

Review Comment:
   I changed this to periodically collect a profile from the worker process. When the process exits with an error, boot.go checks for the existence of a profile and uploads it if it exists.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck merged pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
lostluck merged PR #22225:
URL: https://github.com/apache/beam/pull/22225


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1183211320

   I also updated the description with a new link to the heap profile being generated in our test. You can inspect it w/ `go too-http=:8082 path/to/profile`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1181011561

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r920353660


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"errors"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+)
+
+const (
+	hProfLoc     = "/tmp/hProf"
+	tempHProfLoc = "/tmp/hProf.tmp"
+)
+
+// SampleForHeapProfile checks every second if it should take a heap profile, and if so
+// it takes one and saves it to hProfLoc. A profile will be taken if either:
+// (1) the amount of memory allocated has increased since the last heap profile was taken or
+// (2) it has been 60 seconds since the last heap profile was taken
+func SampleForHeapProfile(ctx context.Context) {
+	var maxAllocatedSoFar uint64
+	maxAllocatedSoFar = 0
+	samplesSkipped := 0
+	for {
+		var m runtime.MemStats
+		runtime.ReadMemStats(&m)
+		if m.Alloc > maxAllocatedSoFar || samplesSkipped >= 60 {
+			samplesSkipped = 0
+			maxAllocatedSoFar = m.Alloc
+			err := saveHeapProfile()
+			log.Warnf(ctx, "err - %v", err)

Review Comment:
   Oops, good catch. 



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"errors"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+)
+
+const (
+	hProfLoc     = "/tmp/hProf"
+	tempHProfLoc = "/tmp/hProf.tmp"
+)
+
+// SampleForHeapProfile checks every second if it should take a heap profile, and if so
+// it takes one and saves it to hProfLoc. A profile will be taken if either:
+// (1) the amount of memory allocated has increased since the last heap profile was taken or
+// (2) it has been 60 seconds since the last heap profile was taken
+func SampleForHeapProfile(ctx context.Context) {
+	var maxAllocatedSoFar uint64
+	maxAllocatedSoFar = 0

Review Comment:
   Oh nice, didn't realize that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1183122046

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r919070657


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {
+	heapDumpLoc := "heapDump"
+
+	f, err := os.Create(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	debug.WriteHeapDump(f.Fd())

Review Comment:
   I think you're right. I got my wires crossed as I moved this code around and thought it wasn't showing up because we never successfully allocate the array that causes the OOM. On closer inspection, even if we successfully allocate a large array in advance, it still won't show up correctly. 
   
   Just about the only path forward at this point would seem to be monitoring memory usage from a separate process and uploading a heap profile when we get close to full consumption. That's probably expensive though, and its not 100% clear to me we want to do that at the cost of performance. We at least will want to gate it behind an experiment.
   
   I'll think a little bit harder about this one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1180855319

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1185596311

   @lostluck FYI, I ended up just adding in the harnessopt changes since they were small enough and that can just totally close this issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1185595742

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on a diff in pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
lostluck commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r920297832


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics

Review Comment:
   Please add a Package comment
   
   // Package diagnostics is a beam internal package that contains...
   // This package is not intended for end user use and can change at any time.
   
   Needs to have a blank line between it and the licence and no blank line between it and the `package diagnostics` line.



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"errors"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+)
+
+const (
+	hProfLoc     = "/tmp/hProf"
+	tempHProfLoc = "/tmp/hProf.tmp"
+)
+
+// SampleForHeapProfile checks every second if it should take a heap profile, and if so
+// it takes one and saves it to hProfLoc. A profile will be taken if either:
+// (1) the amount of memory allocated has increased since the last heap profile was taken or
+// (2) it has been 60 seconds since the last heap profile was taken
+func SampleForHeapProfile(ctx context.Context) {
+	var maxAllocatedSoFar uint64
+	maxAllocatedSoFar = 0

Review Comment:
   Please remove this line.
   
   All newly declared Go variables are zeroed by default.  
   
   While simply the declaration is preferred, `maxAllocatedSoFar := uint64(0)` is also acceptable.



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"errors"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+)
+
+const (
+	hProfLoc     = "/tmp/hProf"
+	tempHProfLoc = "/tmp/hProf.tmp"
+)
+
+// SampleForHeapProfile checks every second if it should take a heap profile, and if so
+// it takes one and saves it to hProfLoc. A profile will be taken if either:
+// (1) the amount of memory allocated has increased since the last heap profile was taken or
+// (2) it has been 60 seconds since the last heap profile was taken
+func SampleForHeapProfile(ctx context.Context) {
+	var maxAllocatedSoFar uint64
+	maxAllocatedSoFar = 0
+	samplesSkipped := 0
+	for {
+		var m runtime.MemStats
+		runtime.ReadMemStats(&m)
+		if m.Alloc > maxAllocatedSoFar || samplesSkipped >= 60 {
+			samplesSkipped = 0
+			maxAllocatedSoFar = m.Alloc
+			err := saveHeapProfile()
+			log.Warnf(ctx, "err - %v", err)
+		} else {
+			samplesSkipped++
+		}
+		// TODO(Issue #21797) - make this value and the samplesSkipped value configurable.
+		time.Sleep(1 * time.Second)
+	}
+}
+
+func saveHeapProfile() error {
+	// Write to a .tmp file before moving to final location to ensure
+	// that OOM doesn't disrupt this flow.
+	// Try removing temp file in case we ran into an error previously
+	if err := os.Remove(tempHProfLoc); err != nil && !errors.Is(err, os.ErrNotExist) {
+		return err
+	}
+	fd, err := os.Create(tempHProfLoc)
+	if err != nil {
+		return err
+	}
+	defer fd.Close()
+	buf := bufio.NewWriterSize(fd, 1<<20) // use 1MB buffer
+
+	err = pprof.WriteHeapProfile(buf)
+	if err != nil {
+		return err
+	}
+
+	if err := buf.Flush(); err != nil {
+		return err
+	}
+
+	if err := os.Remove(hProfLoc); err != nil && !errors.Is(err, os.ErrNotExist) {
+		return err
+	}
+
+	return os.Rename(tempHProfLoc, hProfLoc)
+}
+
+// UploadHeapPrilfe checks if a heap profile is available and uploads it to dest
+// if one is. It will first check hProfLoc for the heap profile and then it will
+// check tempHProfLoc if no file exists at hProfLoc.
+// To use, download the file and run: `go too-http=:8082 path/to/profile`

Review Comment:
   Typo and wordsmithing.
   
   ```suggestion
   // UploadHeapProfile checks if a heap profile is available and, if so, uploads it to dest.
   // It will first check hProfLoc for the heap profile and then it will
   // check tempHProfLoc if no file exists at hProfLoc.
   //
   // To examine, download the file and run: `go tool pprof -http=:8082 path/to/profile`
   ```



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"errors"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+)
+
+const (
+	hProfLoc     = "/tmp/hProf"
+	tempHProfLoc = "/tmp/hProf.tmp"
+)
+
+// SampleForHeapProfile checks every second if it should take a heap profile, and if so
+// it takes one and saves it to hProfLoc. A profile will be taken if either:
+// (1) the amount of memory allocated has increased since the last heap profile was taken or
+// (2) it has been 60 seconds since the last heap profile was taken
+func SampleForHeapProfile(ctx context.Context) {
+	var maxAllocatedSoFar uint64
+	maxAllocatedSoFar = 0
+	samplesSkipped := 0
+	for {
+		var m runtime.MemStats
+		runtime.ReadMemStats(&m)
+		if m.Alloc > maxAllocatedSoFar || samplesSkipped >= 60 {
+			samplesSkipped = 0
+			maxAllocatedSoFar = m.Alloc
+			err := saveHeapProfile()
+			log.Warnf(ctx, "err - %v", err)

Review Comment:
   This will log a warning every time, with very little context. Please make the log conditional on whether the error is `nil` or not, and add context like "failed to save heap profile: %v"



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"errors"
+	"os"
+	"runtime"
+	"runtime/pprof"
+	"time"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/log"
+)
+
+const (
+	hProfLoc     = "/tmp/hProf"
+	tempHProfLoc = "/tmp/hProf.tmp"
+)
+
+// SampleForHeapProfile checks every second if it should take a heap profile, and if so
+// it takes one and saves it to hProfLoc. A profile will be taken if either:
+// (1) the amount of memory allocated has increased since the last heap profile was taken or
+// (2) it has been 60 seconds since the last heap profile was taken
+func SampleForHeapProfile(ctx context.Context) {
+	var maxAllocatedSoFar uint64
+	maxAllocatedSoFar = 0
+	samplesSkipped := 0
+	for {
+		var m runtime.MemStats
+		runtime.ReadMemStats(&m)
+		if m.Alloc > maxAllocatedSoFar || samplesSkipped >= 60 {
+			samplesSkipped = 0
+			maxAllocatedSoFar = m.Alloc
+			err := saveHeapProfile()
+			log.Warnf(ctx, "err - %v", err)
+		} else {
+			samplesSkipped++
+		}
+		// TODO(Issue #21797) - make this value and the samplesSkipped value configurable.
+		time.Sleep(1 * time.Second)
+	}
+}
+
+func saveHeapProfile() error {
+	// Write to a .tmp file before moving to final location to ensure
+	// that OOM doesn't disrupt this flow.
+	// Try removing temp file in case we ran into an error previously
+	if err := os.Remove(tempHProfLoc); err != nil && !errors.Is(err, os.ErrNotExist) {
+		return err
+	}
+	fd, err := os.Create(tempHProfLoc)
+	if err != nil {
+		return err
+	}
+	defer fd.Close()
+	buf := bufio.NewWriterSize(fd, 1<<20) // use 1MB buffer
+
+	err = pprof.WriteHeapProfile(buf)
+	if err != nil {
+		return err
+	}

Review Comment:
   To make it consistent with the other ifs.
   ```suggestion
   	if err := pprof.WriteHeapProfile(buf); err != nil {
   		return err
   	}
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r918443656


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {
+	heapDumpLoc := "heapDump"
+
+	f, err := os.Create(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	debug.WriteHeapDump(f.Fd())
+
+	heapDump, err := os.Open(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	defer heapDump.Close()
+	heapDumpReader := bufio.NewReader(heapDump)
+
+	fs, err := filesystem.New(ctx, dest)
+	if err != nil {
+		return err
+	}
+	defer fs.Close()
+	fd, err := fs.OpenWrite(ctx, dest)
+	if err != nil {
+		return err
+	}
+	buf := bufio.NewWriterSize(fd, 1<<20) // use 1MB buffer
+
+	buf.WriteString(additionalDataToWrite)

Review Comment:
   Whoops, forgot I still had that in there - that was for debugging when I was having a hard time getting the temp_directory to show up (I'd forgotten it was still masked in dataflow.go). I just removed it. I think this should also make actually getting the heap dump to show up easier.



##########
sdks/go/container/boot.go:
##########
@@ -115,7 +119,19 @@ func main() {
 		os.Setenv("RUNNER_CAPABILITIES", strings.Join(info.GetRunnerCapabilities(), " "))
 	}
 
-	log.Fatalf("User program exited: %v", execx.Execute(prog, args...))
+	err = execx.Execute(prog, args...)
+
+	if err != nil {
+		var opt runtime.RawOptionsWrapper
+		err = json.Unmarshal([]byte(options), &opt)
+		if err == nil {
+			if tempLocation, ok := opt.Options.Options["temp_location"]; ok {
+				diagnostics.UploadHeapDump(ctx, fmt.Sprintf("%v/heapDumps/dump-%v-%d", strings.TrimSuffix(tempLocation, "/"), *id, time.Now().Unix()), fmt.Sprintf("Options %v", opt))

Review Comment:
   I just removed this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #22225: Collect heap dump on OOM

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1180809962

   # [Codecov](https://codecov.io/gh/apache/beam/pull/22225?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#22225](https://codecov.io/gh/apache/beam/pull/22225?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (a614747) into [master](https://codecov.io/gh/apache/beam/commit/ec47b12cd54bef4632c8a8e0ebca6b88e597c327?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ec47b12) will **decrease** coverage by `0.05%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #22225      +/-   ##
   ==========================================
   - Coverage   74.21%   74.15%   -0.06%     
   ==========================================
     Files         702      703       +1     
     Lines       92829    92901      +72     
   ==========================================
     Hits        68892    68892              
   - Misses      22670    22742      +72     
     Partials     1267     1267              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | go | `51.36% <0.00%> (-0.14%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/22225?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ks/go/pkg/beam/core/runtime/harness/diagnostics.go](https://codecov.io/gh/apache/beam/pull/22225/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvaGFybmVzcy9kaWFnbm9zdGljcy5nbw==) | `0.00% <0.00%> (ø)` | |
   | [sdks/go/pkg/beam/runners/dataflow/dataflow.go](https://codecov.io/gh/apache/beam/pull/22225/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9ydW5uZXJzL2RhdGFmbG93L2RhdGFmbG93Lmdv) | `59.77% <ø> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/beam/pull/22225?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/beam/pull/22225?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ec47b12...a614747](https://codecov.io/gh/apache/beam/pull/22225?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1183210326

   @lostluck this should be good for another round of review. Thanks for bearing with me here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1183512605

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1185621439

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on a diff in pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
lostluck commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r918419721


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"

Review Comment:
   I think this approach is OK if we file a P3 issue noting that it's currently cloud vendor specific.
   
   ---
   
   This is the bit where the container becomes "Assumes Google Cloud" instead of being more flexible, and requires updating/maintenance and a new version (or forces custom containers) if we wanted to support alternative blob stores. It's not very open as a result.
   
   This is where we'd want to *not* have the `gcs` import, and be able to rely on whatever the user program has imported/enabled, so they can write to ec2 or a hdfs, or a network filesystem mount or something.
   
   On the plus side, we could make this dependency explicit in the boot container if we move this _ import into the boot.go instead. This code could also be re-used in any other approach with "restarting" the binary to do the upload.



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {
+	heapDumpLoc := "heapDump"
+
+	f, err := os.Create(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	debug.WriteHeapDump(f.Fd())
+
+	heapDump, err := os.Open(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	defer heapDump.Close()
+	heapDumpReader := bufio.NewReader(heapDump)
+
+	fs, err := filesystem.New(ctx, dest)
+	if err != nil {
+		return err
+	}
+	defer fs.Close()
+	fd, err := fs.OpenWrite(ctx, dest)
+	if err != nil {
+		return err
+	}
+	buf := bufio.NewWriterSize(fd, 1<<20) // use 1MB buffer
+
+	buf.WriteString(additionalDataToWrite)

Review Comment:
   Why are we adding additional information to the heap dump file? Wouldn't this corrupt whatever file format the heap is in, preventing using the tools to analyse it?



##########
sdks/go/container/boot.go:
##########
@@ -115,7 +119,19 @@ func main() {
 		os.Setenv("RUNNER_CAPABILITIES", strings.Join(info.GetRunnerCapabilities(), " "))
 	}
 
-	log.Fatalf("User program exited: %v", execx.Execute(prog, args...))
+	err = execx.Execute(prog, args...)
+
+	if err != nil {
+		var opt runtime.RawOptionsWrapper
+		err = json.Unmarshal([]byte(options), &opt)

Review Comment:
   As written the error from execx.Execute is never read and this error will overwrite it. Suggest changing it to `err := json...` so the original err is shadowed instead and will be available for printing in the final log.Fatal.



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {

Review Comment:
   Always have documentation for Exported functions. In this case: at least point to instructions on how to read the file/make use of it.



##########
sdks/go/container/boot.go:
##########
@@ -115,7 +119,19 @@ func main() {
 		os.Setenv("RUNNER_CAPABILITIES", strings.Join(info.GetRunnerCapabilities(), " "))
 	}
 
-	log.Fatalf("User program exited: %v", execx.Execute(prog, args...))
+	err = execx.Execute(prog, args...)
+
+	if err != nil {
+		var opt runtime.RawOptionsWrapper
+		err = json.Unmarshal([]byte(options), &opt)
+		if err == nil {
+			if tempLocation, ok := opt.Options.Options["temp_location"]; ok {
+				diagnostics.UploadHeapDump(ctx, fmt.Sprintf("%v/heapDumps/dump-%v-%d", strings.TrimSuffix(tempLocation, "/"), *id, time.Now().Unix()), fmt.Sprintf("Options %v", opt))

Review Comment:
   Please add a `\n` after the additional data so it's easier to extract a valid heap dump from the resulting file. As it stands I can't take the sample file, and get pprof to render the heap in question.



##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {
+	heapDumpLoc := "heapDump"
+
+	f, err := os.Create(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	debug.WriteHeapDump(f.Fd())

Review Comment:
   I'm not sure if this is getting what we want now that I'm looking into this.
   
   1. It's not clear to me that we're actually getting the heap from the spawned process. So what are we actually getting? The example heap dump is only ~8MB, which feels closer to what the launcher process is probably using, rather than the worker on death...
   2. AFAICT we might not actually want a debug.WriteHeapDump dump generally anyway. It's the whole heap. Every Byte. Attempting to actually read the linked example dump has been... difficult. https://pkg.go.dev/runtime/pprof#WriteHeapProfile is likely easier for most folks to deal with and will indicate where the leaks are coming from if any. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1181148706

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1180869594

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1180914774

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1183233146

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap profile on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1185666961

   Run GoPortable PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1180830413

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1181008530

   R: @lostluck this should be good to go. The failing action is unrelated, it has been [permared for its whole life](https://github.com/apache/beam/actions/workflows/playground_deploy_examples.yml) (I did put up https://github.com/apache/beam/pull/22226 to fix it though 😄)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r918444758


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"

Review Comment:
   Yeah, I agree this isn't as flexible as I'd like. The user can always build their own container if needed, but its non-ideal.
   
   I'm good moving it to boot.go



##########
sdks/go/container/boot.go:
##########
@@ -115,7 +119,19 @@ func main() {
 		os.Setenv("RUNNER_CAPABILITIES", strings.Join(info.GetRunnerCapabilities(), " "))
 	}
 
-	log.Fatalf("User program exited: %v", execx.Execute(prog, args...))
+	err = execx.Execute(prog, args...)
+
+	if err != nil {
+		var opt runtime.RawOptionsWrapper
+		err = json.Unmarshal([]byte(options), &opt)

Review Comment:
   Good catch.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on a diff in pull request #22225: Collect heap dump on OOM on Dataflow

Posted by GitBox <gi...@apache.org>.
lostluck commented on code in PR #22225:
URL: https://github.com/apache/beam/pull/22225#discussion_r919084678


##########
sdks/go/pkg/beam/util/diagnostics/diagnostics.go:
##########
@@ -0,0 +1,68 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package diagnostics
+
+import (
+	"bufio"
+	"context"
+	"os"
+
+	"runtime/debug"
+
+	"github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem"
+	_ "github.com/apache/beam/sdks/v2/go/pkg/beam/io/filesystem/gcs"
+)
+
+func UploadHeapDump(ctx context.Context, dest string, additionalDataToWrite string) error {
+	heapDumpLoc := "heapDump"
+
+	f, err := os.Create(heapDumpLoc)
+	if err != nil {
+		return err
+	}
+	debug.WriteHeapDump(f.Fd())

Review Comment:
   Doing a full debug.WriteHeapDump would definitely be expensive, but a pprof HeapProfile is an allocation sample, and is much cheaper. Technically we do take memory stats when the runner sends a WorkerStatus call, and samples can be taken very regularly without undue performance impact.
   
   But I agree we probably want to centralize some kind of "Thrashing" detection so other parts of the stack could reduce their memory usage dynamically if desired, without everything doing it's own detection.  This mostly means designing with this usage in mind more than getting everything unified from the start.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22225: Collect heap dump on OOM

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22225:
URL: https://github.com/apache/beam/pull/22225#issuecomment-1180807238

   Run Go PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org