You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/04 18:22:08 UTC

[GitHub] [beam] lostluck commented on a change in pull request #11517: [BEAM-9643] Adding Go SDF Documentation.

lostluck commented on a change in pull request #11517:
URL: https://github.com/apache/beam/pull/11517#discussion_r419635469



##########
File path: sdks/go/pkg/beam/core/sdf/sdf.go
##########
@@ -13,63 +13,64 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
-// Package sdf is experimental, incomplete, and not yet meant for general usage.
+// Package contains interfaces used specifically for splittable DoFns.
+//
+// Warning: Splittable DoFns are still experimental, largely untested, and
+// likely to have bugs.
 package sdf
 
 // RTracker is an interface used to interact with restrictions while processing elements in
-// SplittableDoFns. Each implementation of RTracker is expected to be used for tracking a single
-// restriction type, which is the type that should be used to create the RTracker, and output by
-// TrySplit.
+// splittable DoFns (specifically, in the ProcessElement method). Each RTracker tracks the progress
+// of a single restriction.
 type RTracker interface {
-	// TryClaim attempts to claim the block of work in the current restriction located at a given
-	// position. This method must be used in the ProcessElement method of Splittable DoFns to claim
-	// work before performing it. If no work is claimed, the ProcessElement is not allowed to perform
-	// work or emit outputs. If the claim is successful, the DoFn must process the entire block. If
-	// the claim is unsuccessful the ProcessElement method of the DoFn must return without performing
-	// any additional work or emitting any outputs.
-	//
-	// TryClaim accepts an arbitrary value that can be interpreted as the position of a block, and
-	// returns a boolean indicating whether the claim succeeded.
+	// TryClaim attempts to claim the block of work located in the given position of the
+	// restriction. This method must be called in ProcessElement to claim work before it can be
+	// processed. Processing work without claiming it first can lead to incorrect output.
 	//
-	// If the claim fails due to an error, that error can be retrieved with GetError.
+	// If the claim is successful, the DoFn must process the entire block. If the claim is
+	// unsuccessful ProcessElement method of the DoFn must return without performing
+	// any additional work or emitting any outputs.
 	//
-	// For SDFs to work properly, claims must always be monotonically increasing in reference to the
-	// restriction's start and end points, and every block of work in a restriction must be claimed.
+	// If the claim fails due to an error, that error is stored and will be automatically emitted
+	// when the RTracker is validated, or can be manually retrieved with GetError.
 	//
 	// This pseudocode example illustrates the typical usage of TryClaim:
 	//
-	// 	pos = position of first block after restriction.start
+	// 	pos = position of first block within the restriction
 	// 	for TryClaim(pos) == true {
 	// 		// Do all work in the claimed block and emit outputs.
-	// 		pos = position of next block
+	// 		pos = position of next block within the restriction
 	// 	}
 	// 	return
 	TryClaim(pos interface{}) (ok bool)
 
-	// GetError returns the error that made this RTracker stop executing, and it returns nil if no
-	// error occurred. If IsDone fails while validating this RTracker, this method will be
-	// called to log the error.
+	// GetError returns the error that made this RTracker stop executing, and returns nil if no
+	// error occurred. This is the error that is emitted if automated validation fails.
 	GetError() error
 
-	// TrySplit splits the current restriction into a primary and residual based on a fraction of the
-	// work remaining. The split is performed along the first valid split point located after the
-	// given fraction of the remainder. This method is called by the SDK harness when receiving a
-	// split request by the runner.
+	// TrySplit splits the current restriction into a primary (currently executing work) and
+	// residual (work to be split off) based on a fraction of work remaining. The split is performed
+	// at the first valid split point located after the given fraction of remaining work.
+	//
+	// For example, a fraction of 0.5 means to split at the halfway point of remaining work only. If
+	// 50% of work is done and 50% remaining, then a fraction of 0.5 would split after 75% of work.
+	//
+	// This method modifies the underlying restriction in the RTracker to reflect the primary. It
+	// then returns a copy of the newly modified restriction as a primary, and returns a new
+	// restriction for the residual. If the split would produce an empty residual (i.e. the only
+	// split point is the end of the restriction), then the returned residual is nil.
 	//
-	// The current restriction is split into two by modifying the current restriction's endpoint to
-	// turn it into the primary, and returning a new restriction tracker representing the residual.
-	// If no valid split point exists, this method returns nil instead of a residual, but does not
-	// return an error. If this method is unable to split due to some error then it returns nil and
-	// an error.
-	TrySplit(fraction float64) (residual interface{}, err error)
+	// If an error is returned, some catastrophic failure occurred and the entire bundle will fail.
+	TrySplit(fraction float64) (primary, residual interface{}, err error)

Review comment:
       Technically since GetError is going to be reading from a location that might be concurrently modified, it's required to be thread safe as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org