You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jerry Lam <ch...@gmail.com> on 2015/08/05 00:43:07 UTC

Poor HDFS Data Locality on Spark-EC2

Hi Spark users and developers,

I have been trying to use spark-ec2. After I launched the spark cluster
(1.4.1) with ephemeral hdfs (using hadoop 2.4.0), I tried to execute a job
where the data is stored in the ephemeral hdfs. It does not matter what I
tried to do, there is no data locality at all. For instance, filtering data
and calculating the count of the filter data will always have locality
level "any". I tweaked the configurations spark.locality.wait.* but it does
not seem to care. I'm guessing this is because the hostname cannot be
resolved properly. Does anyone experience this problem before?

Best Regards,

Jerry