You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/04/01 19:50:18 UTC

[GitHub] [spark] squito commented on issue #24175: [SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero

squito commented on issue #24175: [SPARK-27232][SQL]Ignore file locality in InMemoryFileIndex if spark.locality.wait is set to zero
URL: https://github.com/apache/spark/pull/24175#issuecomment-478720781
 
 
   @LantaoJin just pointed me at this based on some discussion in https://github.com/apache/spark/pull/23951.  I totally understand the use case for this, but it needs to use a new config.  Even with locality wait == 0, spark still tries to schedule tasks to take advantage of locality.  It just means spark won't *wait* until it gets an offer with better locality.  In fact I regularly recommend users to turn locality wait == 0 even on colocated clusters.
   
   Furthermore, even in disagg clusters, you don't necessarily want to turn *all* locality wait to 0, right?  I mean you still might want to wait for locality persisted data from cached rdds?
   
   https://github.com/apache/spark/pull/23951  pointed out a case for skipping rack resolution entirely on disagg clusters.  This is another good case.  I'm not entirely sure if they should be controlled by the same thing ... I wonder if there is some hdfs-specific thing which might be appropriate here.  Eg. you might have "semi" disagg clusters with most data living remotely, but some small local hdfs.  I'm not sure if there is an easy way to figure this out.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org