You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ansriniv <an...@gmail.com> on 2014/05/30 05:47:14 UTC

getPreferredLocations

I am building my own custom RDD class.

1) Is there a guarantee that a partition will only be processed on a node
which is in the "getPreferredLocations" set of nodes returned by the RDD ? 

2) I am implementing this custom RDD in Java and plan to extend JavaRDD.
However, I dont see a "getPreferredLocations" method in
http://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.api.java.JavaRDD

Will I be able to implement my own custom RDD in Java and be able to
override the "getPreferredLocations" method ?

Thanks
Anand



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/getPreferredLocations-tp6554.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: getPreferredLocations

Posted by Patrick Wendell <pw...@gmail.com>.
> 1) Is there a guarantee that a partition will only be processed on a node
> which is in the "getPreferredLocations" set of nodes returned by the RDD ?

No there isn't, by default Spark may schedule in a "non preferred"
location after `spark.locality.wait` has expired.

http://spark.apache.org/docs/latest/configuration.html#scheduling

If you want to have the behavior that this is treated as a constraint,
you can turn spark.locality.wait to a very high value. Keep in mind
though, this will starve your job if all of the preferred locations
are on nodes that are not alive.

>
> 2) I am implementing this custom RDD in Java and plan to extend JavaRDD.
> However, I dont see a "getPreferredLocations" method in

There is not currently support for writing custom RDD classes in Java.
Or at least, you'd need to write some of the internals in Scala and
then you could write a Java wrapper.

Can I ask though - what is the reason you are trying to write a custom
RDD? This is usually a somewhat advanced use case if you are, e.g.
writing a Spark integration with a new storage system or something
like that.