You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "lostluck (via GitHub)" <gi...@apache.org> on 2023/04/19 22:44:53 UTC

[GitHub] [beam] lostluck commented on a diff in pull request #26188: [Go SDK]: Implement fileio.MatchContinuously transform

lostluck commented on code in PR #26188:
URL: https://github.com/apache/beam/pull/26188#discussion_r1171919456


##########
sdks/go/pkg/beam/io/fileio/match.go:
##########
@@ -187,3 +198,146 @@ func metadataFromFiles(
 
 	return metadata, nil
 }
+
+// duplicateTreatment controls how duplicate matches are treated.
+type duplicateTreatment int
+
+const (
+	// duplicateAllow allows duplicate matches.
+	duplicateAllow duplicateTreatment = iota
+	// duplicateSkip skips duplicate matches.
+	duplicateSkip
+)
+
+type matchContOption struct {
+	Start              time.Time
+	End                time.Time
+	DuplicateTreatment duplicateTreatment
+	ApplyWindow        bool
+}
+
+// MatchContOptionFn is a function that can be passed to MatchContinuously to configure options for
+// matching files.
+type MatchContOptionFn func(*matchContOption)
+
+// MatchStart specifies the start time for matching files.
+func MatchStart(start time.Time) MatchContOptionFn {
+	return func(o *matchContOption) {
+		o.Start = start
+	}
+}
+
+// MatchEnd specifies the end time for matching files.
+func MatchEnd(end time.Time) MatchContOptionFn {
+	return func(o *matchContOption) {
+		o.End = end
+	}
+}
+
+// MatchDuplicateAllow specifies that file path matches will not be deduplicated.
+func MatchDuplicateAllow() MatchContOptionFn {
+	return func(o *matchContOption) {
+		o.DuplicateTreatment = duplicateAllow
+	}
+}
+
+// MatchDuplicateSkip specifies that file path matches will be deduplicated.
+func MatchDuplicateSkip() MatchContOptionFn {
+	return func(o *matchContOption) {
+		o.DuplicateTreatment = duplicateSkip
+	}
+}
+
+// MatchApplyWindow specifies that each element will be assigned to an individual window.
+func MatchApplyWindow() MatchContOptionFn {
+	return func(o *matchContOption) {
+		o.ApplyWindow = true
+	}
+}
+
+// MatchContinuously finds all files matching the glob pattern at the given interval and returns a
+// PCollection<FileMetadata> of the matching files. MatchContinuously accepts a variadic number of
+// MatchContOptionFn that can be used to configure:
+//
+//   - Start: start time for matching files. Defaults to the current timestamp
+//   - End: end time for matching files. Defaults to the maximum timestamp
+//   - DuplicateAllow: allow emitting matches that have already been observed. Defaults to false
+//   - DuplicateSkip: skip emitting matches that have already been observed. Defaults to true

Review Comment:
   I'll review PR 26192 first then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org