You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/29 02:54:07 UTC

[GitHub] [spark] HeartSaVioR commented on issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be closed if any task is using it - adds inuse tracking.

HeartSaVioR commented on issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be closed if any task is using it - adds inuse tracking.
URL: https://github.com/apache/spark/pull/19096#issuecomment-477846977
 
 
   I'm really sorry to say on the PR which already goes through pretty long discussions and actually I asked to resurrect, so blame me please.
   
   First of all, I guess the patch is addressing the issue properly in current caching approach, and we should adopt this patch if we stick with current approach. (That's why I said the patch is still needed and asked to resurrect this.)
   
   Revisiting the current caching approach based on the fact, the caching approach now requires concerning about thread-safety, and to handle it we are also adding flags and counters to handle, making cached producer instances being stateful (except the things Kafka producer itself may have), which would make us not easy to understand and maintain the code.
   
   If I remember it correctly, in SPARK-25151 (#22138 - the issue is still open and valid I think) someone (I guess Gabor) asked me "why don't you apply commons-pool to producer as well", I answered producer is thread-safe so can apply current cache approach whereas consumer is not thread-safe so it should have to be isolated (good to apply general connection pool implementation).
   
   Now I feel it's worth to consider applying it to producer as well (isolating per thread) which keeps producer to be stateless (same here, except the things Kafka producer itself may have) - it should require more connections to Kafka if there're concurrent usages, but once the number of necessary producers are pooled it would not hurt, which we expect for long running query like streaming one.
   
   Would like to hear everyone's voices. Thanks in advance!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org