You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 23:54:29 UTC

[GitHub] [beam] kennknowles opened a new issue, #19411: Spark unbounded source advances watermarks prematurely

kennknowles opened a new issue, #19411:
URL: https://github.com/apache/beam/issues/19411

   `SparkUnboundedSource` will advance the watermark to the MAX of the watermark of any partition. You can see it at [https://github.com/apache/beam/blob/fab12c772d461fc8db4b3c361d38fe2781926fff/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L204](https://github.com/apache/beam/blob/fab12c772d461fc8db4b3c361d38fe2781926fff/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L204) .
   
   This should be the MIN - this is a combining of watermarks - not advancing. This currently means the watermark moves too quickly and the slowest partition of an unbounded source has elements that are routinely marked late.
   
   Imported from Jira [BEAM-7423](https://issues.apache.org/jira/browse/BEAM-7423). Original Jira may contain additional context.
   Reported by: mikekap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org