You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/07 21:53:55 UTC

[GitHub] [beam] robertwb commented on a diff in pull request #17726: [BEAM-3221] Improve documentation around split request and response

robertwb commented on code in PR #17726:
URL: https://github.com/apache/beam/pull/17726#discussion_r891736087


##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -438,42 +441,77 @@ message ProcessBundleSplitRequest {
     // possible and returning the remainder).
     double fraction_of_remainder = 1;
 
-    // A set of allowed element indices where the SDK may split. When this is
-    // empty, there are no constraints on where to split.
+    // (Optional) A set of allowed element indices where the SDK may split. When
+    // this is empty, there are no constraints on where to split.
     repeated int64 allowed_split_points = 3;
 
-    // (Required for GrpcRead operations) Number of total elements expected
-    // to be sent to this GrpcRead operation, required to correctly account
-    // for unreceived data when determining where to split.
+    // (Required for gRPC Read operation transforms) Number of total elements
+    // expected to be sent to this GrpcRead operation, required to correctly
+    // account for unreceived data when determining where to split.
     int64 estimated_input_elements = 2;
-
-    // TODO(SDF): Allow providing weights rather than sizes.
   }
 
   // (Required) Specifies the desired split for each transform.
   //
-  // Currently only splits at GRPC read operations are supported.
+  // Currently only splits at gRPC read operations are supported.
   // This may, of course, limit the amount of work downstream operations
   // receive.
   map<string, DesiredSplit> desired_splits = 3;
 }
 
-// Represents a partition of the bundle: a "primary" and
-// a "residual", with the following properties:
+// Represents a partition of the bundle: a "primary" and a "residual", with the
+// following properties:
 // - The work in primary and residual doesn't overlap, and combined, adds up
 //   to the work in the current bundle if the split hadn't happened.
 // - The current bundle, if it keeps executing, will have done none of the
-//   work under residual_roots.
+//   work under residual_roots and none of the elements in the
 // - The current bundle, if no further splits happen, will have done exactly
-//   the work under primary_roots.
+//   the work under primary_roots and all elements up to and including the
+//   channel splits last_primary_element.
+//

Review Comment:
   It might be valuable to have a conceptual summary here, e.g. "This allows the SDK to relinquish ownership of and commit to not process some of the elements that it may have been sent (the residual) while retaining ownership and commitment to finish the other portion (the primary)."



##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -484,24 +522,38 @@ message ProcessBundleSplitResponse {
   // (if the bundle is large) and often a more efficient representation
   // on the runner side (e.g. if the set of elements can be represented
   // as some range in an underlying dataset).
+  //
+  // Note that for a split the following properties must hold:
+  // - last_primary_element < first_residual_element
+  // - primary roots and residual roots can only be specified if the
+  //   last_primary_element + 1 < first_residual_element
+  //   (typically there is one primary and residual root per element in the

Review Comment:
   Maybe say something about primary root and residual roots being a disjoint but full coverage of the work represented by the elements between last_primary_element and first_residual_element?



##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -438,42 +441,77 @@ message ProcessBundleSplitRequest {
     // possible and returning the remainder).
     double fraction_of_remainder = 1;
 
-    // A set of allowed element indices where the SDK may split. When this is
-    // empty, there are no constraints on where to split.
+    // (Optional) A set of allowed element indices where the SDK may split. When
+    // this is empty, there are no constraints on where to split.
     repeated int64 allowed_split_points = 3;
 
-    // (Required for GrpcRead operations) Number of total elements expected
-    // to be sent to this GrpcRead operation, required to correctly account
-    // for unreceived data when determining where to split.
+    // (Required for gRPC Read operation transforms) Number of total elements
+    // expected to be sent to this GrpcRead operation, required to correctly
+    // account for unreceived data when determining where to split.
     int64 estimated_input_elements = 2;
-
-    // TODO(SDF): Allow providing weights rather than sizes.
   }
 
   // (Required) Specifies the desired split for each transform.
   //
-  // Currently only splits at GRPC read operations are supported.
+  // Currently only splits at gRPC read operations are supported.
   // This may, of course, limit the amount of work downstream operations
   // receive.
   map<string, DesiredSplit> desired_splits = 3;
 }
 
-// Represents a partition of the bundle: a "primary" and
-// a "residual", with the following properties:
+// Represents a partition of the bundle: a "primary" and a "residual", with the
+// following properties:
 // - The work in primary and residual doesn't overlap, and combined, adds up
 //   to the work in the current bundle if the split hadn't happened.
 // - The current bundle, if it keeps executing, will have done none of the
-//   work under residual_roots.
+//   work under residual_roots and none of the elements in the

Review Comment:
   complete sentence 



##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -484,24 +522,38 @@ message ProcessBundleSplitResponse {
   // (if the bundle is large) and often a more efficient representation
   // on the runner side (e.g. if the set of elements can be represented
   // as some range in an underlying dataset).
+  //
+  // Note that for a split the following properties must hold:
+  // - last_primary_element < first_residual_element
+  // - primary roots and residual roots can only be specified if the
+  //   last_primary_element + 1 < first_residual_element
+  //   (typically there is one primary and residual root per element in the
+  //   range (last_primary_element, first_residual_element))
+  //
+  // Note that subsequent splits of the same bundle must ensure that:
+  // - the last_primary_element does not increase
+  // - the first_residual_element does not increase
+  // - the first_residual_element does not decrease if there were residual
+  //    or primary roots returned in a prior split.
   message ChannelSplit {
     // (Required) The grpc read transform reading this channel.
     string transform_id = 1;
 
-    // The last element of the input channel that should be entirely considered
-    // part of the primary, identified by its absolute index in the (ordered)
-    // channel.
+    // (Required) The last element of the input channel that should be entirely
+    // considered part of the primary, identified by its absolute index in the

Review Comment:
   zero-based index? (similarly below)



##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -324,6 +324,9 @@ message ProcessBundleResponse {
   // following applications need to be scheduled and executed in the future.
   // A runner that does not yet support residual roots MUST still check that
   // this is empty for correctness.
+  //
+  // Note that these residual roots may not have been returned as part of a

Review Comment:
   should this read "must not"?



##########
model/fn-execution/src/main/proto/org/apache/beam/model/fn_execution/v1/beam_fn_api.proto:
##########
@@ -484,24 +522,38 @@ message ProcessBundleSplitResponse {
   // (if the bundle is large) and often a more efficient representation
   // on the runner side (e.g. if the set of elements can be represented
   // as some range in an underlying dataset).
+  //
+  // Note that for a split the following properties must hold:
+  // - last_primary_element < first_residual_element
+  // - primary roots and residual roots can only be specified if the
+  //   last_primary_element + 1 < first_residual_element
+  //   (typically there is one primary and residual root per element in the
+  //   range (last_primary_element, first_residual_element))
+  //
+  // Note that subsequent splits of the same bundle must ensure that:
+  // - the last_primary_element does not increase

Review Comment:
   This is not a requirement. The underlying requirement is that primary_n + residual_n = primary_{n-1}.
   
   In practice part or all of last_primary_element + 1 is often part of the previous residual. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org