You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/15 03:11:40 UTC

[GitHub] [spark] viirya edited a comment on pull request #35854: [SPARK-38549][SS] Add numRowsDroppedByWatermark to SessionWindowStateStoreRestoreExec

viirya edited a comment on pull request #35854:
URL: https://github.com/apache/spark/pull/35854#issuecomment-1067516577


   We know `SessionWindowStateStoreSaveExec` is behind `SessionWindowStateStoreRestoreExec` in the operator order. So if input rows are dropped by `SessionWindowStateStoreRestoreExec`, we won't see them in later operators such as `SessionWindowStateStoreSaveExec`.
   
   That's why we observed that some rows seems dropped by watermark, but we don't see any `numRowsDroppedByWatermark`.
   
   `SessionWindowStateStoreRestoreExec` is not a state store writer, so it doesn't have `numRowsDroppedByWatermark` metric, but it actually drops input rows by watermark predicate. It is confused to end users as they cannot accurately measure the number of dropped by watermark.
   
   Does it make sense to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org