You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Praveen Seluka <ps...@qubole.com> on 2014/06/19 15:04:03 UTC

Getting started : Spark on YARN issue

I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN  +
HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
Now am trying to run the example Spark job . (In Yarn-cluster mode).

>From my *local machine. *I have setup HADOOP_CONF_DIR environment variable
correctly.

➜  spark git:(master) ✗ /bin/bash -c "./bin/spark-submit --class
org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2
--driver-memory 2g --executor-memory 2g --executor-cores 1
examples/target/scala-2.10/spark-examples_*.jar 10"
14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at
ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050
14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 1
14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default,
queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
      queueApplicationCount = 0, queueChildQueueCount = 0
14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single
resource in this cluster 12288
14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources
14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local reads
feature cannot be used because libhadoop cannot be loaded.
14/06/19 14:59:43 INFO yarn.Client: Uploading
file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar
to hdfs://
ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar
14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
waiting for channel to be ready for connect. ch :
java.nio.channels.SocketChannel[connection-pending remote=/
10.180.150.66:50010]
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
        at
org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
        at
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning
BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009
14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode
10.180.150.66:50010
14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception

Its able to talk to Resource Manager
Then it puts the example.jar file to HDFS and it fails. Its trying to write
to datanode. I verified that 50010 port is accessible through local
machine. Any idea whats the issue here ?
One thing thats suspicious is */10.180.150.66:50010
<http://10.180.150.66:50010> - it looks like its trying to connect using
private IP. If so, how can I resolve this to use public IP.*

Thanks
Praveen

Re: Getting started : Spark on YARN issue

Posted by Praveen Seluka <ps...@qubole.com>.
Hi Andrew

Thanks Andrew for your suggestion. I updated the hdfs-site on server side
and also on client side to use hostname instead of IP as mentioned here =>
http://rainerpeter.wordpress.com/2014/02/12/connect-to-hdfs-running-in-ec2-using-public-ip-addresses/
. Now, I could see that the client is able to talk to the datanode.

Also, I will consider submitting application from within ec2 itself so that
private IP is resolvable.

Thanks
Praveen


On Fri, Jun 20, 2014 at 2:35 AM, Andrew Or <an...@databricks.com> wrote:

> (Also, an easier workaround is to simply submit the application from
> within your
> cluster, thus saving you all the manual labor of reconfiguring everything
> to use
> public hostnames. This may or may not be applicable to your use case.)
>
>
> 2014-06-19 14:04 GMT-07:00 Andrew Or <an...@databricks.com>:
>
> Hi Praveen,
>>
>> Yes, the fact that it is trying to use a private IP from outside of the
>> cluster is suspicious.
>> My guess is that your HDFS is configured to use internal IPs rather than
>> external IPs.
>> This means even though the hadoop confs on your local machine only use
>> external IPs,
>> the org.apache.spark.deploy.yarn.Client that is running on your local
>> machine is trying
>> to use whatever address your HDFS name node tells it to use, which is
>> private in this
>> case.
>>
>> A potential fix is to update your hdfs-site.xml (and other related
>> configs) within your
>> cluster to use public hostnames. Let me know if that does the job.
>>
>> Andrew
>>
>>
>> 2014-06-19 6:04 GMT-07:00 Praveen Seluka <ps...@qubole.com>:
>>
>> I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN  +
>>> HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
>>> Now am trying to run the example Spark job . (In Yarn-cluster mode).
>>>
>>> From my *local machine. *I have setup HADOOP_CONF_DIR environment
>>> variable correctly.
>>>
>>> ➜  spark git:(master) ✗ /bin/bash -c "./bin/spark-submit --class
>>> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2
>>> --driver-memory 2g --executor-memory 2g --executor-cores 1
>>> examples/target/scala-2.10/spark-examples_*.jar 10"
>>> 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at
>>> ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050
>>> 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from
>>> ApplicationsManager (ASM), number of NodeManagers: 1
>>> 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default,
>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single
>>> resource in this cluster 12288
>>> 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources
>>> 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local
>>> reads feature cannot be used because libhadoop cannot be loaded.
>>> 14/06/19 14:59:43 INFO yarn.Client: Uploading
>>> file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar
>>> to hdfs://
>>> ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar
>>> 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in
>>> createBlockOutputStream
>>> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout
>>> while waiting for channel to be ready for connect. ch :
>>> java.nio.channels.SocketChannel[connection-pending remote=/
>>> 10.180.150.66:50010]
>>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
>>>         at
>>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>>> 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning
>>> BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009
>>> 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode
>>> 10.180.150.66:50010
>>> 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception
>>>
>>> Its able to talk to Resource Manager
>>> Then it puts the example.jar file to HDFS and it fails. Its trying to
>>> write to datanode. I verified that 50010 port is accessible through local
>>> machine. Any idea whats the issue here ?
>>> One thing thats suspicious is */10.180.150.66:50010
>>> <http://10.180.150.66:50010> - it looks like its trying to connect using
>>> private IP. If so, how can I resolve this to use public IP.*
>>>
>>> Thanks
>>> Praveen
>>>
>>
>>
>

Re: Getting started : Spark on YARN issue

Posted by Andrew Or <an...@databricks.com>.
(Also, an easier workaround is to simply submit the application from within
your
cluster, thus saving you all the manual labor of reconfiguring everything
to use
public hostnames. This may or may not be applicable to your use case.)


2014-06-19 14:04 GMT-07:00 Andrew Or <an...@databricks.com>:

> Hi Praveen,
>
> Yes, the fact that it is trying to use a private IP from outside of the
> cluster is suspicious.
> My guess is that your HDFS is configured to use internal IPs rather than
> external IPs.
> This means even though the hadoop confs on your local machine only use
> external IPs,
> the org.apache.spark.deploy.yarn.Client that is running on your local
> machine is trying
> to use whatever address your HDFS name node tells it to use, which is
> private in this
> case.
>
> A potential fix is to update your hdfs-site.xml (and other related
> configs) within your
> cluster to use public hostnames. Let me know if that does the job.
>
> Andrew
>
>
> 2014-06-19 6:04 GMT-07:00 Praveen Seluka <ps...@qubole.com>:
>
> I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN  +
>> HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
>> Now am trying to run the example Spark job . (In Yarn-cluster mode).
>>
>> From my *local machine. *I have setup HADOOP_CONF_DIR environment
>> variable correctly.
>>
>> ➜  spark git:(master) ✗ /bin/bash -c "./bin/spark-submit --class
>> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2
>> --driver-memory 2g --executor-memory 2g --executor-cores 1
>> examples/target/scala-2.10/spark-examples_*.jar 10"
>> 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at
>> ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050
>> 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from
>> ApplicationsManager (ASM), number of NodeManagers: 1
>> 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default,
>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>       queueApplicationCount = 0, queueChildQueueCount = 0
>> 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single
>> resource in this cluster 12288
>> 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources
>> 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local
>> reads feature cannot be used because libhadoop cannot be loaded.
>> 14/06/19 14:59:43 INFO yarn.Client: Uploading
>> file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar
>> to hdfs://
>> ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar
>> 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in
>> createBlockOutputStream
>> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
>> waiting for channel to be ready for connect. ch :
>> java.nio.channels.SocketChannel[connection-pending remote=/
>> 10.180.150.66:50010]
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
>> 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning
>> BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009
>> 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode
>> 10.180.150.66:50010
>> 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception
>>
>> Its able to talk to Resource Manager
>> Then it puts the example.jar file to HDFS and it fails. Its trying to
>> write to datanode. I verified that 50010 port is accessible through local
>> machine. Any idea whats the issue here ?
>> One thing thats suspicious is */10.180.150.66:50010
>> <http://10.180.150.66:50010> - it looks like its trying to connect using
>> private IP. If so, how can I resolve this to use public IP.*
>>
>> Thanks
>> Praveen
>>
>
>

Re: Getting started : Spark on YARN issue

Posted by Andrew Or <an...@databricks.com>.
Hi Praveen,

Yes, the fact that it is trying to use a private IP from outside of the
cluster is suspicious.
My guess is that your HDFS is configured to use internal IPs rather than
external IPs.
This means even though the hadoop confs on your local machine only use
external IPs,
the org.apache.spark.deploy.yarn.Client that is running on your local
machine is trying
to use whatever address your HDFS name node tells it to use, which is
private in this
case.

A potential fix is to update your hdfs-site.xml (and other related configs)
within your
cluster to use public hostnames. Let me know if that does the job.

Andrew


2014-06-19 6:04 GMT-07:00 Praveen Seluka <ps...@qubole.com>:

> I am trying to run Spark on YARN. I have a hadoop 2.2 cluster (YARN  +
> HDFS) in EC2. Then, I compiled Spark using Maven with 2.2 hadoop profiles.
> Now am trying to run the example Spark job . (In Yarn-cluster mode).
>
> From my *local machine. *I have setup HADOOP_CONF_DIR environment
> variable correctly.
>
> ➜  spark git:(master) ✗ /bin/bash -c "./bin/spark-submit --class
> org.apache.spark.examples.SparkPi --master yarn-cluster --num-executors 2
> --driver-memory 2g --executor-memory 2g --executor-cores 1
> examples/target/scala-2.10/spark-examples_*.jar 10"
> 14/06/19 14:59:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 14/06/19 14:59:39 INFO client.RMProxy: Connecting to ResourceManager at
> ec2-54-242-244-250.compute-1.amazonaws.com/54.242.244.250:8050
> 14/06/19 14:59:41 INFO yarn.Client: Got Cluster metric info from
> ApplicationsManager (ASM), number of NodeManagers: 1
> 14/06/19 14:59:41 INFO yarn.Client: Queue info ... queueName: default,
> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>       queueApplicationCount = 0, queueChildQueueCount = 0
> 14/06/19 14:59:41 INFO yarn.Client: Max mem capabililty of a single
> resource in this cluster 12288
> 14/06/19 14:59:41 INFO yarn.Client: Preparing Local resources
> 14/06/19 14:59:42 WARN hdfs.BlockReaderLocal: The short-circuit local
> reads feature cannot be used because libhadoop cannot be loaded.
> 14/06/19 14:59:43 INFO yarn.Client: Uploading
> file:/home/rgupta/awesome/spark/examples/target/scala-2.10/spark-examples_2.10-1.0.0-SNAPSHOT.jar
> to hdfs://
> ec2-54-242-244-250.compute-1.amazonaws.com:8020/user/rgupta/.sparkStaging/application_1403176373037_0009/spark-examples_2.10-1.0.0-SNAPSHOT.jar
> 14/06/19 15:00:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream
> org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending remote=/
> 10.180.150.66:50010]
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:532)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1305)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1128)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
>         at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 14/06/19 15:00:45 INFO hdfs.DFSClient: Abandoning
> BP-1714253233-10.180.215.105-1403176367942:blk_1073741833_1009
> 14/06/19 15:00:46 INFO hdfs.DFSClient: Excluding datanode
> 10.180.150.66:50010
> 14/06/19 15:00:46 WARN hdfs.DFSClient: DataStreamer Exception
>
> Its able to talk to Resource Manager
> Then it puts the example.jar file to HDFS and it fails. Its trying to
> write to datanode. I verified that 50010 port is accessible through local
> machine. Any idea whats the issue here ?
> One thing thats suspicious is */10.180.150.66:50010
> <http://10.180.150.66:50010> - it looks like its trying to connect using
> private IP. If so, how can I resolve this to use public IP.*
>
> Thanks
> Praveen
>