You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 20:47:11 UTC

[GitHub] [beam] damccorm opened a new issue, #20969: Progressing watermark for not available Kinesis stream

damccorm opened a new issue, #20969:
URL: https://github.com/apache/beam/issues/20969

   We use Dataflow with Apache Beam to read events from Kinesis streams. Recently, we've spotted that in a case when one of the streams was not available in the middle of events processing (due to removal or problem with the credentials), the data watermark for this stream was still being updated.  
   
   Imagine scenario:
    - Permissions allow to read from stream A
    - Data is read from stream A
    - Permissions are changed and don’t allow to read from stream A
    - Watermark for stream A is progressing (but stream data is not read due to permissions issue)
    - Permissions are fixed to read stream A
    - Data is read from stream A but from the updated watermark
   
   As a result, stream data between steps 3-5 is lost and the client doesn’t know that.
   
   Additionally, it may be confusing from the Dataflow console perspective, as it suggests that events are still being read from the stream. It is hard to rely on the watermark as a source metric for alerting purposes as well.
   
   Brief investigation suggests that maybe the _KinesisReader.getWatermark()_ logic doesn’t consider the state of the stream i.e. is it available or not, and it treats the removed stream as a stream without traffic. Watermark calculation should be adjusted to take that information into account.
   
   Imported from Jira [BEAM-12406](https://issues.apache.org/jira/browse/BEAM-12406). Original Jira may contain additional context.
   Reported by: mateuszratajocado.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org