You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/08 07:16:50 UTC

[GitHub] [spark] ivoson opened a new pull request #34220: Fast fail when withWatermark called on non-streaming dataset

ivoson opened a new pull request #34220:
URL: https://github.com/apache/spark/pull/34220


   ### What changes were proposed in this pull request?
   Fast fail when [Dataset.withWatermark](https://github.com/apache/spark/blob/v3.2.0-rc7/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L740) is triggered on non-streaming Dataset.
   
   ### Why are the changes needed?
   Now it can be triggered on non-streaming dataset, and there is a specific rule to eliminate in analyze phase. User can call this API and nothing happens, it may be a little bit confused.
   If the usage is not as expected, we can just fast fail it with explicit message, and then we do not have to keep on extra specific rule for it.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Ut added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-938425545


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-941792480


   Sorry for the late. Technically, the result between batch and streaming should be same if the query makes the perfect watermark. We should try to fix if there is a case not respecting this and the operation is supported on both batch and streaming.
   
   Batch assumes perfect watermark because all inputs are available when running the query, so it's legitimate to consider withWatermark as no-op in the batch query. And as @zsxwing said, we are trying to make ease of switching the query between batch and streaming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
zsxwing commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-941149021


   Thanks for the contribution! This is for making users easy to switch their streaming queries to batch queries in order to debug. Failing the batch query would make the debugging annoying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ivoson commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
ivoson commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-941874642


   Thanks for your point, will close the PR. @zsxwing @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ivoson commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
ivoson commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-940677310


   cc @cloud-fan @zsxwing Could you give a review on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-941792480


   Sorry for the late. Technically, the result between batch and streaming should be same if the query makes the perfect watermark. We should try to fix if there is a case not respecting this and the operation is supported on both batch and streaming.
   
   Batch assumes perfect watermark because all inputs are available when running the query, so it's legitimate to consider withWatermark as no-op in the batch query. And as @zsxwing said, we are trying to make ease of switching the query between batch and streaming.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ivoson commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
ivoson commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-941874642


   Thanks for your point, will close the PR. @zsxwing @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-940783314


   @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ivoson closed pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
ivoson closed pull request #34220:
URL: https://github.com/apache/spark/pull/34220


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zsxwing commented on pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
zsxwing commented on pull request #34220:
URL: https://github.com/apache/spark/pull/34220#issuecomment-941149021


   Thanks for the contribution! This is for making users easy to switch their streaming queries to batch queries in order to debug. Failing the batch query would make the debugging annoying.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ivoson closed pull request #34220: [SPARK-36954][SQL] Fast fail when withWatermark called on non-streaming dataset

Posted by GitBox <gi...@apache.org>.
ivoson closed pull request #34220:
URL: https://github.com/apache/spark/pull/34220


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org