You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/23 13:26:25 UTC

[GitHub] [spark] srowen commented on a change in pull request #25552: [SPARK-28849][CORE] Add a number to control transferTo calls to avoid infinite loop in some occasional cases

srowen commented on a change in pull request #25552: [SPARK-28849][CORE] Add a number to control transferTo calls to avoid infinite loop in some occasional cases
URL: https://github.com/apache/spark/pull/25552#discussion_r317130417
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/util/Utils.scala
 ##########
 @@ -417,16 +418,19 @@ private[spark] object Utils extends Logging {
       input: FileChannel,
       output: WritableByteChannel,
       startPosition: Long,
-      bytesToCopy: Long): Unit = {
+      bytesToCopy: Long,
+      numTransferToCalls: Int): Unit = {
     val outputInitialState = output match {
       case outputFileChannel: FileChannel =>
         Some((outputFileChannel.position(), outputFileChannel))
       case _ => None
     }
     var count = 0L
+    var num = 0
     // In case transferTo method transferred less data than we have required.
-    while (count < bytesToCopy) {
+    while (count < bytesToCopy && num < numTransferToCalls) {
       count += input.transferTo(count + startPosition, bytesToCopy - count, output)
 
 Review comment:
   Can you log the return of `transferTo` "locally" in some way to further test? Not sure how easy it is to reproduce. In the absence of evidence, I think checking <= 0 is the most that is safe. I think it bears understanding the issue more first before any change is made.
   
   In any event I don't think you can ever just stop copying data and continue successfully. You would have to fail the job. 
   
   If the job is merely slow, at best, you'd want to introduce some kind of timeout at a higher level as there are all kinds of ways it can be slow. But that would also take care of your current situation.
   
   But you can always kill jobs that are slow directly, without any change to Spark.
   This doesn't fix the cause anyway (assuming it's outside Spark). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org