You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/09 20:40:59 UTC

[GitHub] [spark] HeartSaVioR commented on pull request #31522: [SPARK-34399][SQL] Add commit duration to SQL tab's graph node.

HeartSaVioR commented on pull request #31522:
URL: https://github.com/apache/spark/pull/31522#issuecomment-794426247


   I think the key arguments here are "how much time the committing can take at worse case" and "how frequently it occurs".
   
   I have no answer for second one as I could only hear from the customers' when they complained, but I can give the first one according to customers' case. That's not just 10s of seconds of course. (I would rather say they only concern when the gap is "significant", not just a few more mins.) It can be couple of hours or even longer on HDFS unhealthy case. Most likely their complaints on this behavior are "why the Spark driver got hang?", because there's no log during committing, unless they turned on DEBUG log for Hadoop code path.
   
   That said, I have mixed feeling on this. I agree that explaining the missing time range is important when we track back the problem from event log, but assume the commit ended somehow, then the log will tell. I would like to know about the answer of second one in production before making decision, but if the case is not happening often, that might be something we can live with.
   
   And for the extreme case like taking hours on committing, I think more important thing is to log periodically to let end users determine whether the Spark driver is hang or not, without enabling DEBUG log for sure. Maybe off-topic, but if we'd like to have priority on these things, I'd rather say that's more needed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org