You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohit Singh <mo...@gmail.com> on 2014/02/20 22:43:05 UTC
Running spark on cluster
Hi,
I am trying to run spark on a standalone cluster mode..
But I am not sure whether it is executing on cluster at all?
So basically, I have 8 node cluster
master
node01
node02
....
node07
Spark is installed in master @ ~/spark
I have copied ~/spark across all the nodes..
And then I am trying to use pyspark shell
I load a 10 GB file..present on my master disk (not hdfs etc)..
and try to count the number of lines:
I see this:
14/02/20 12:54:29 INFO TaskSetManager: Finished TID 420 in 233 ms on
localhost (progress: 420/922)
14/02/20 12:54:29 INFO DAGScheduler: Completed ResultTask(0, 420)
14/02/20 12:54:29 INFO BlockManager: Found block broadcast_0 locally
14/02/20 12:54:29 INFO HadoopRDD: Input split:
file:/home/hadoop/data/backup/data/domain/domainz0:14126415872+33554432
.....
I am starting pyspark shell as MASTER=spark://master:7070 ./pyspark
Though when I start pyspark.. I do see this error:
14/02/20 12:59:45 INFO HttpBroadcast: Broadcast server started at
http://10.2.1.18:53329
14/02/20 12:59:45 INFO SparkEnv: Registering MapOutputTracker
14/02/20 12:59:45 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-c0b1aebb-1995-4849-873f-eb1a534b4af2
14/02/20 12:59:45 INFO HttpServer: Starting HTTP Server
14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:286)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
at
org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)
14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
org.eclipse.jetty.server.Server@37a8d4d2: java.net.BindException: Address
already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at
org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at
org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:286)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at
org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
at
org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
at
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:744)
14/02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
Trying again.
14/02/20 12:59:45 INFO JettyUtils: Error was:
Failure(java.net.BindException: Address already in use)
02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040. Trying
again.
14/02/20 12:59:45 INFO JettyUtils: Error was:
Failure(java.net.BindException: Address already in use)
14/02/20 12:59:45 INFO SparkUI: Started Spark Web UI at http://master:4041
14/02/20 12:59:45 INFO AppClient$ClientActor: Connecting to master
spark://master:7070...
14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@master:7070:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@master:7070]
14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@master:7070:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@master:7070]
14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@master:7070:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@master:7070]
14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster@master:7070:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://sparkMaster@master:7070]
And after all this thing.. pyspark shell starts..
And if kill 4040 port,, then things work just fine.
Can someone help me with this
Thanks
--
Mohit
"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates
Re: Running spark on cluster
Posted by Mayur Rustagi <ma...@gmail.com>.
lynx for accessing webui on console :)
Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi
On Thu, Feb 20, 2014 at 2:52 PM, Mohit Singh <mo...@gmail.com> wrote:
> Ok.. That issue is fixed. I didnt set the master and port on
> spark-context...
> Now after setting things look alright..but now it is stuck at:
> 14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 918)
> 14/02/20 14:47:15 INFO TaskSetManager: Finished TID 919 in 563 ms on
> master (progress: 690/922)
> 14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 919)
> 14/02/20 14:47:15 INFO TaskSetManager: Finished TID 920 in 558 ms on
> master (progress: 691/922)
> 14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 920)
> 14/02/20 14:47:15 INFO TaskSetManager: Finished TID 916 in 770 ms on
> master (progress: 692/922)
> 14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 916)
>
>
> And I cant access the web ui because of corporate fire wall etc..
> Any more ideas.
> Thanks
>
>
> On Thu, Feb 20, 2014 at 2:41 PM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> check the spark UI, you'll see in the application area if its working
>> there else its running locally.
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Thu, Feb 20, 2014 at 2:39 PM, Mohit Singh <mo...@gmail.com> wrote:
>>
>>> Cool. That checked out..
>>> But I still strongly feel that my simple line code job is running
>>> locally and not on cluster?
>>> Any suggestions.
>>> Thanks
>>>
>>>
>>> On Thu, Feb 20, 2014 at 1:45 PM, Mayur Rustagi <ma...@gmail.com>wrote:
>>>
>>>> netstat -an |grep 4040 check ports
>>>> In all likelyhood you are running spark shell or something in the
>>>> background.
>>>>
>>>>
>>>> Mayur Rustagi
>>>> Ph: +919632149971
>>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>>> https://twitter.com/mayur_rustagi
>>>>
>>>>
>>>>
>>>> On Thu, Feb 20, 2014 at 1:43 PM, Mohit Singh <mo...@gmail.com>wrote:
>>>>
>>>>> Hi,
>>>>> I am trying to run spark on a standalone cluster mode..
>>>>> But I am not sure whether it is executing on cluster at all?
>>>>> So basically, I have 8 node cluster
>>>>> master
>>>>> node01
>>>>> node02
>>>>> ....
>>>>> node07
>>>>> Spark is installed in master @ ~/spark
>>>>> I have copied ~/spark across all the nodes..
>>>>> And then I am trying to use pyspark shell
>>>>>
>>>>> I load a 10 GB file..present on my master disk (not hdfs etc)..
>>>>> and try to count the number of lines:
>>>>> I see this:
>>>>> 14/02/20 12:54:29 INFO TaskSetManager: Finished TID 420 in 233 ms on
>>>>> localhost (progress: 420/922)
>>>>> 14/02/20 12:54:29 INFO DAGScheduler: Completed ResultTask(0, 420)
>>>>> 14/02/20 12:54:29 INFO BlockManager: Found block broadcast_0 locally
>>>>> 14/02/20 12:54:29 INFO HadoopRDD: Input split:
>>>>> file:/home/hadoop/data/backup/data/domain/domainz0:14126415872
>>>>> +33554432
>>>>> .....
>>>>>
>>>>>
>>>>> I am starting pyspark shell as MASTER=spark://master:7070 ./pyspark
>>>>>
>>>>> Though when I start pyspark.. I do see this error:
>>>>> 14/02/20 12:59:45 INFO HttpBroadcast: Broadcast server started at
>>>>> http://10.2.1.18:53329
>>>>> 14/02/20 12:59:45 INFO SparkEnv: Registering MapOutputTracker
>>>>> 14/02/20 12:59:45 INFO HttpFileServer: HTTP File server directory is
>>>>> /tmp/spark-c0b1aebb-1995-4849-873f-eb1a534b4af2
>>>>> 14/02/20 12:59:45 INFO HttpServer: Starting HTTP Server
>>>>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>>>>> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
>>>>> already in use
>>>>> java.net.BindException: Address already in use
>>>>> at sun.nio.ch.Net.bind0(Native Method)
>>>>> at sun.nio.ch.Net.bind(Net.java:444)
>>>>> at sun.nio.ch.Net.bind(Net.java:436)
>>>>> at
>>>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>>>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>>>>> at
>>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>>>>> at
>>>>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>>>>> at
>>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>>>>> at
>>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>>>>> at
>>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>>> at scala.util.Try$.apply(Try.scala:161)
>>>>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>>>>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>>>>> at
>>>>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>> at
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>>> at
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>>>>> at
>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>> at py4j.Gateway.invoke(Gateway.java:214)
>>>>> at
>>>>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>>>>> at
>>>>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>>>>> org.eclipse.jetty.server.Server@37a8d4d2: java.net.BindException:
>>>>> Address already in use
>>>>> java.net.BindException: Address already in use
>>>>> at sun.nio.ch.Net.bind0(Native Method)
>>>>> at sun.nio.ch.Net.bind(Net.java:444)
>>>>> at sun.nio.ch.Net.bind(Net.java:436)
>>>>> at
>>>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>>>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>>>>> at
>>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>>>>> at
>>>>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>>>>> at
>>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>>>>> at
>>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>>>>> at
>>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>>> at scala.util.Try$.apply(Try.scala:161)
>>>>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>>>>> at
>>>>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>>>>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>>>>> at
>>>>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>> at
>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>>> at
>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>>>>> at
>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>> at py4j.Gateway.invoke(Gateway.java:214)
>>>>> at
>>>>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>>>>> at
>>>>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> 14/02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>>>>> Trying again.
>>>>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>>>>> Failure(java.net.BindException: Address already in use)
>>>>>
>>>>> 02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>>>>> Trying again.
>>>>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>>>>> Failure(java.net.BindException: Address already in use)
>>>>> 14/02/20 12:59:45 INFO SparkUI: Started Spark Web UI at
>>>>> http://master:4041
>>>>> 14/02/20 12:59:45 INFO AppClient$ClientActor: Connecting to master
>>>>> spark://master:7070...
>>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>>> akka.tcp://sparkMaster@master:7070:
>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>> [akka.tcp://sparkMaster@master:7070]
>>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>>> akka.tcp://sparkMaster@master:7070:
>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>> [akka.tcp://sparkMaster@master:7070]
>>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>>> akka.tcp://sparkMaster@master:7070:
>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>> [akka.tcp://sparkMaster@master:7070]
>>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>>> akka.tcp://sparkMaster@master:7070:
>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>> [akka.tcp://sparkMaster@master:7070]
>>>>>
>>>>>
>>>>>
>>>>> And after all this thing.. pyspark shell starts..
>>>>> And if kill 4040 port,, then things work just fine.
>>>>> Can someone help me with this
>>>>> Thanks
>>>>>
>>>>>
>>>>> --
>>>>> Mohit
>>>>>
>>>>> "When you want success as badly as you want the air, then you will get
>>>>> it. There is no other secret of success."
>>>>> -Socrates
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Mohit
>>>
>>> "When you want success as badly as you want the air, then you will get
>>> it. There is no other secret of success."
>>> -Socrates
>>>
>>
>>
>
>
> --
> Mohit
>
> "When you want success as badly as you want the air, then you will get it.
> There is no other secret of success."
> -Socrates
>
Re: Running spark on cluster
Posted by Mohit Singh <mo...@gmail.com>.
Ok.. That issue is fixed. I didnt set the master and port on
spark-context...
Now after setting things look alright..but now it is stuck at:
14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 918)
14/02/20 14:47:15 INFO TaskSetManager: Finished TID 919 in 563 ms on master
(progress: 690/922)
14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 919)
14/02/20 14:47:15 INFO TaskSetManager: Finished TID 920 in 558 ms on master
(progress: 691/922)
14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 920)
14/02/20 14:47:15 INFO TaskSetManager: Finished TID 916 in 770 ms on master
(progress: 692/922)
14/02/20 14:47:15 INFO DAGScheduler: Completed ResultTask(0, 916)
And I cant access the web ui because of corporate fire wall etc..
Any more ideas.
Thanks
On Thu, Feb 20, 2014 at 2:41 PM, Mayur Rustagi <ma...@gmail.com>wrote:
> check the spark UI, you'll see in the application area if its working
> there else its running locally.
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 2:39 PM, Mohit Singh <mo...@gmail.com> wrote:
>
>> Cool. That checked out..
>> But I still strongly feel that my simple line code job is running locally
>> and not on cluster?
>> Any suggestions.
>> Thanks
>>
>>
>> On Thu, Feb 20, 2014 at 1:45 PM, Mayur Rustagi <ma...@gmail.com>wrote:
>>
>>> netstat -an |grep 4040 check ports
>>> In all likelyhood you are running spark shell or something in the
>>> background.
>>>
>>>
>>> Mayur Rustagi
>>> Ph: +919632149971
>>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>>> https://twitter.com/mayur_rustagi
>>>
>>>
>>>
>>> On Thu, Feb 20, 2014 at 1:43 PM, Mohit Singh <mo...@gmail.com>wrote:
>>>
>>>> Hi,
>>>> I am trying to run spark on a standalone cluster mode..
>>>> But I am not sure whether it is executing on cluster at all?
>>>> So basically, I have 8 node cluster
>>>> master
>>>> node01
>>>> node02
>>>> ....
>>>> node07
>>>> Spark is installed in master @ ~/spark
>>>> I have copied ~/spark across all the nodes..
>>>> And then I am trying to use pyspark shell
>>>>
>>>> I load a 10 GB file..present on my master disk (not hdfs etc)..
>>>> and try to count the number of lines:
>>>> I see this:
>>>> 14/02/20 12:54:29 INFO TaskSetManager: Finished TID 420 in 233 ms on
>>>> localhost (progress: 420/922)
>>>> 14/02/20 12:54:29 INFO DAGScheduler: Completed ResultTask(0, 420)
>>>> 14/02/20 12:54:29 INFO BlockManager: Found block broadcast_0 locally
>>>> 14/02/20 12:54:29 INFO HadoopRDD: Input split:
>>>> file:/home/hadoop/data/backup/data/domain/domainz0:14126415872+33554432
>>>> .....
>>>>
>>>>
>>>> I am starting pyspark shell as MASTER=spark://master:7070 ./pyspark
>>>>
>>>> Though when I start pyspark.. I do see this error:
>>>> 14/02/20 12:59:45 INFO HttpBroadcast: Broadcast server started at
>>>> http://10.2.1.18:53329
>>>> 14/02/20 12:59:45 INFO SparkEnv: Registering MapOutputTracker
>>>> 14/02/20 12:59:45 INFO HttpFileServer: HTTP File server directory is
>>>> /tmp/spark-c0b1aebb-1995-4849-873f-eb1a534b4af2
>>>> 14/02/20 12:59:45 INFO HttpServer: Starting HTTP Server
>>>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>>>> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
>>>> already in use
>>>> java.net.BindException: Address already in use
>>>> at sun.nio.ch.Net.bind0(Native Method)
>>>> at sun.nio.ch.Net.bind(Net.java:444)
>>>> at sun.nio.ch.Net.bind(Net.java:436)
>>>> at
>>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>>>> at
>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>>>> at
>>>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>>>> at
>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>>>> at
>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>>>> at
>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>> at scala.util.Try$.apply(Try.scala:161)
>>>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>>>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>>>> at
>>>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>> at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>>>> at
>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>> at py4j.Gateway.invoke(Gateway.java:214)
>>>> at
>>>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>>>> at
>>>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>>>> org.eclipse.jetty.server.Server@37a8d4d2: java.net.BindException:
>>>> Address already in use
>>>> java.net.BindException: Address already in use
>>>> at sun.nio.ch.Net.bind0(Native Method)
>>>> at sun.nio.ch.Net.bind(Net.java:444)
>>>> at sun.nio.ch.Net.bind(Net.java:436)
>>>> at
>>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>>>> at
>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>>>> at
>>>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>>>> at
>>>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>>>> at
>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>>>> at
>>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>>> at scala.util.Try$.apply(Try.scala:161)
>>>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>>>> at
>>>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>>>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>>>> at
>>>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> Method)
>>>> at
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>>>> at
>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>> at py4j.Gateway.invoke(Gateway.java:214)
>>>> at
>>>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>>>> at
>>>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 14/02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>>>> Trying again.
>>>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>>>> Failure(java.net.BindException: Address already in use)
>>>>
>>>> 02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>>>> Trying again.
>>>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>>>> Failure(java.net.BindException: Address already in use)
>>>> 14/02/20 12:59:45 INFO SparkUI: Started Spark Web UI at
>>>> http://master:4041
>>>> 14/02/20 12:59:45 INFO AppClient$ClientActor: Connecting to master
>>>> spark://master:7070...
>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@master:7070:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@master:7070]
>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@master:7070:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@master:7070]
>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@master:7070:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@master:7070]
>>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@master:7070:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@master:7070]
>>>>
>>>>
>>>>
>>>> And after all this thing.. pyspark shell starts..
>>>> And if kill 4040 port,, then things work just fine.
>>>> Can someone help me with this
>>>> Thanks
>>>>
>>>>
>>>> --
>>>> Mohit
>>>>
>>>> "When you want success as badly as you want the air, then you will get
>>>> it. There is no other secret of success."
>>>> -Socrates
>>>>
>>>
>>>
>>
>>
>> --
>> Mohit
>>
>> "When you want success as badly as you want the air, then you will get
>> it. There is no other secret of success."
>> -Socrates
>>
>
>
--
Mohit
"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates
Re: Running spark on cluster
Posted by Mayur Rustagi <ma...@gmail.com>.
check the spark UI, you'll see in the application area if its working there
else its running locally.
Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi
On Thu, Feb 20, 2014 at 2:39 PM, Mohit Singh <mo...@gmail.com> wrote:
> Cool. That checked out..
> But I still strongly feel that my simple line code job is running locally
> and not on cluster?
> Any suggestions.
> Thanks
>
>
> On Thu, Feb 20, 2014 at 1:45 PM, Mayur Rustagi <ma...@gmail.com>wrote:
>
>> netstat -an |grep 4040 check ports
>> In all likelyhood you are running spark shell or something in the
>> background.
>>
>>
>> Mayur Rustagi
>> Ph: +919632149971
>> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
>> https://twitter.com/mayur_rustagi
>>
>>
>>
>> On Thu, Feb 20, 2014 at 1:43 PM, Mohit Singh <mo...@gmail.com> wrote:
>>
>>> Hi,
>>> I am trying to run spark on a standalone cluster mode..
>>> But I am not sure whether it is executing on cluster at all?
>>> So basically, I have 8 node cluster
>>> master
>>> node01
>>> node02
>>> ....
>>> node07
>>> Spark is installed in master @ ~/spark
>>> I have copied ~/spark across all the nodes..
>>> And then I am trying to use pyspark shell
>>>
>>> I load a 10 GB file..present on my master disk (not hdfs etc)..
>>> and try to count the number of lines:
>>> I see this:
>>> 14/02/20 12:54:29 INFO TaskSetManager: Finished TID 420 in 233 ms on
>>> localhost (progress: 420/922)
>>> 14/02/20 12:54:29 INFO DAGScheduler: Completed ResultTask(0, 420)
>>> 14/02/20 12:54:29 INFO BlockManager: Found block broadcast_0 locally
>>> 14/02/20 12:54:29 INFO HadoopRDD: Input split:
>>> file:/home/hadoop/data/backup/data/domain/domainz0:14126415872+33554432
>>> .....
>>>
>>>
>>> I am starting pyspark shell as MASTER=spark://master:7070 ./pyspark
>>>
>>> Though when I start pyspark.. I do see this error:
>>> 14/02/20 12:59:45 INFO HttpBroadcast: Broadcast server started at
>>> http://10.2.1.18:53329
>>> 14/02/20 12:59:45 INFO SparkEnv: Registering MapOutputTracker
>>> 14/02/20 12:59:45 INFO HttpFileServer: HTTP File server directory is
>>> /tmp/spark-c0b1aebb-1995-4849-873f-eb1a534b4af2
>>> 14/02/20 12:59:45 INFO HttpServer: Starting HTTP Server
>>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>>> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
>>> already in use
>>> java.net.BindException: Address already in use
>>> at sun.nio.ch.Net.bind0(Native Method)
>>> at sun.nio.ch.Net.bind(Net.java:444)
>>> at sun.nio.ch.Net.bind(Net.java:436)
>>> at
>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>>> at
>>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>>> at
>>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>>> at
>>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>>> at
>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>>> at
>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>> at
>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>>> at
>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>> at
>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>> at scala.util.Try$.apply(Try.scala:161)
>>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>>> at
>>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>>> at
>>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>> at py4j.Gateway.invoke(Gateway.java:214)
>>> at
>>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>>> at
>>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>> at java.lang.Thread.run(Thread.java:744)
>>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>>> org.eclipse.jetty.server.Server@37a8d4d2: java.net.BindException:
>>> Address already in use
>>> java.net.BindException: Address already in use
>>> at sun.nio.ch.Net.bind0(Native Method)
>>> at sun.nio.ch.Net.bind(Net.java:444)
>>> at sun.nio.ch.Net.bind(Net.java:436)
>>> at
>>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>>> at
>>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>>> at
>>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>>> at
>>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>>> at
>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>>> at
>>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>>> at
>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>>> at
>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>> at
>>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>>> at scala.util.Try$.apply(Try.scala:161)
>>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>>> at
>>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>>> at
>>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>> at py4j.Gateway.invoke(Gateway.java:214)
>>> at
>>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>>> at
>>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>> at java.lang.Thread.run(Thread.java:744)
>>> 14/02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>>> Trying again.
>>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>>> Failure(java.net.BindException: Address already in use)
>>>
>>> 02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>>> Trying again.
>>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>>> Failure(java.net.BindException: Address already in use)
>>> 14/02/20 12:59:45 INFO SparkUI: Started Spark Web UI at
>>> http://master:4041
>>> 14/02/20 12:59:45 INFO AppClient$ClientActor: Connecting to master
>>> spark://master:7070...
>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@master:7070:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@master:7070]
>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@master:7070:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@master:7070]
>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@master:7070:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@master:7070]
>>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>>> akka.tcp://sparkMaster@master:7070:
>>> akka.remote.EndpointAssociationException: Association failed with
>>> [akka.tcp://sparkMaster@master:7070]
>>>
>>>
>>>
>>> And after all this thing.. pyspark shell starts..
>>> And if kill 4040 port,, then things work just fine.
>>> Can someone help me with this
>>> Thanks
>>>
>>>
>>> --
>>> Mohit
>>>
>>> "When you want success as badly as you want the air, then you will get
>>> it. There is no other secret of success."
>>> -Socrates
>>>
>>
>>
>
>
> --
> Mohit
>
> "When you want success as badly as you want the air, then you will get it.
> There is no other secret of success."
> -Socrates
>
Re: Running spark on cluster
Posted by Mohit Singh <mo...@gmail.com>.
Cool. That checked out..
But I still strongly feel that my simple line code job is running locally
and not on cluster?
Any suggestions.
Thanks
On Thu, Feb 20, 2014 at 1:45 PM, Mayur Rustagi <ma...@gmail.com>wrote:
> netstat -an |grep 4040 check ports
> In all likelyhood you are running spark shell or something in the
> background.
>
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Feb 20, 2014 at 1:43 PM, Mohit Singh <mo...@gmail.com> wrote:
>
>> Hi,
>> I am trying to run spark on a standalone cluster mode..
>> But I am not sure whether it is executing on cluster at all?
>> So basically, I have 8 node cluster
>> master
>> node01
>> node02
>> ....
>> node07
>> Spark is installed in master @ ~/spark
>> I have copied ~/spark across all the nodes..
>> And then I am trying to use pyspark shell
>>
>> I load a 10 GB file..present on my master disk (not hdfs etc)..
>> and try to count the number of lines:
>> I see this:
>> 14/02/20 12:54:29 INFO TaskSetManager: Finished TID 420 in 233 ms on
>> localhost (progress: 420/922)
>> 14/02/20 12:54:29 INFO DAGScheduler: Completed ResultTask(0, 420)
>> 14/02/20 12:54:29 INFO BlockManager: Found block broadcast_0 locally
>> 14/02/20 12:54:29 INFO HadoopRDD: Input split:
>> file:/home/hadoop/data/backup/data/domain/domainz0:14126415872+33554432
>> .....
>>
>>
>> I am starting pyspark shell as MASTER=spark://master:7070 ./pyspark
>>
>> Though when I start pyspark.. I do see this error:
>> 14/02/20 12:59:45 INFO HttpBroadcast: Broadcast server started at
>> http://10.2.1.18:53329
>> 14/02/20 12:59:45 INFO SparkEnv: Registering MapOutputTracker
>> 14/02/20 12:59:45 INFO HttpFileServer: HTTP File server directory is
>> /tmp/spark-c0b1aebb-1995-4849-873f-eb1a534b4af2
>> 14/02/20 12:59:45 INFO HttpServer: Starting HTTP Server
>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
>> already in use
>> java.net.BindException: Address already in use
>> at sun.nio.ch.Net.bind0(Native Method)
>> at sun.nio.ch.Net.bind(Net.java:444)
>> at sun.nio.ch.Net.bind(Net.java:436)
>> at
>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>> at
>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>> at
>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>> at
>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>> at
>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>> at
>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>> at
>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>> at
>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>> at
>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>> at scala.util.Try$.apply(Try.scala:161)
>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>> at
>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>> at
>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>> at py4j.Gateway.invoke(Gateway.java:214)
>> at
>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>> at
>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> at java.lang.Thread.run(Thread.java:744)
>> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
>> org.eclipse.jetty.server.Server@37a8d4d2: java.net.BindException:
>> Address already in use
>> java.net.BindException: Address already in use
>> at sun.nio.ch.Net.bind0(Native Method)
>> at sun.nio.ch.Net.bind(Net.java:444)
>> at sun.nio.ch.Net.bind(Net.java:436)
>> at
>> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
>> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>> at
>> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
>> at
>> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
>> at
>> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
>> at
>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
>> at
>> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
>> at
>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
>> at
>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>> at
>> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
>> at scala.util.Try$.apply(Try.scala:161)
>> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
>> at
>> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
>> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
>> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
>> at
>> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>> at py4j.Gateway.invoke(Gateway.java:214)
>> at
>> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>> at
>> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> at java.lang.Thread.run(Thread.java:744)
>> 14/02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
>> Trying again.
>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>> Failure(java.net.BindException: Address already in use)
>>
>> 02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040. Trying
>> again.
>> 14/02/20 12:59:45 INFO JettyUtils: Error was:
>> Failure(java.net.BindException: Address already in use)
>> 14/02/20 12:59:45 INFO SparkUI: Started Spark Web UI at
>> http://master:4041
>> 14/02/20 12:59:45 INFO AppClient$ClientActor: Connecting to master
>> spark://master:7070...
>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>> akka.tcp://sparkMaster@master:7070:
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@master:7070]
>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>> akka.tcp://sparkMaster@master:7070:
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@master:7070]
>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>> akka.tcp://sparkMaster@master:7070:
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@master:7070]
>> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
>> akka.tcp://sparkMaster@master:7070:
>> akka.remote.EndpointAssociationException: Association failed with
>> [akka.tcp://sparkMaster@master:7070]
>>
>>
>>
>> And after all this thing.. pyspark shell starts..
>> And if kill 4040 port,, then things work just fine.
>> Can someone help me with this
>> Thanks
>>
>>
>> --
>> Mohit
>>
>> "When you want success as badly as you want the air, then you will get
>> it. There is no other secret of success."
>> -Socrates
>>
>
>
--
Mohit
"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates
Re: Running spark on cluster
Posted by Mayur Rustagi <ma...@gmail.com>.
netstat -an |grep 4040 check ports
In all likelyhood you are running spark shell or something in the
background.
Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi
On Thu, Feb 20, 2014 at 1:43 PM, Mohit Singh <mo...@gmail.com> wrote:
> Hi,
> I am trying to run spark on a standalone cluster mode..
> But I am not sure whether it is executing on cluster at all?
> So basically, I have 8 node cluster
> master
> node01
> node02
> ....
> node07
> Spark is installed in master @ ~/spark
> I have copied ~/spark across all the nodes..
> And then I am trying to use pyspark shell
>
> I load a 10 GB file..present on my master disk (not hdfs etc)..
> and try to count the number of lines:
> I see this:
> 14/02/20 12:54:29 INFO TaskSetManager: Finished TID 420 in 233 ms on
> localhost (progress: 420/922)
> 14/02/20 12:54:29 INFO DAGScheduler: Completed ResultTask(0, 420)
> 14/02/20 12:54:29 INFO BlockManager: Found block broadcast_0 locally
> 14/02/20 12:54:29 INFO HadoopRDD: Input split:
> file:/home/hadoop/data/backup/data/domain/domainz0:14126415872+33554432
> .....
>
>
> I am starting pyspark shell as MASTER=spark://master:7070 ./pyspark
>
> Though when I start pyspark.. I do see this error:
> 14/02/20 12:59:45 INFO HttpBroadcast: Broadcast server started at
> http://10.2.1.18:53329
> 14/02/20 12:59:45 INFO SparkEnv: Registering MapOutputTracker
> 14/02/20 12:59:45 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-c0b1aebb-1995-4849-873f-eb1a534b4af2
> 14/02/20 12:59:45 INFO HttpServer: Starting HTTP Server
> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
> SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address
> already in use
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
> at
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
> at
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
> at
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
> at
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
> at
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
> at
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
> at
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:214)
> at
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
> at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:744)
> 14/02/20 12:59:45 WARN AbstractLifeCycle: FAILED
> org.eclipse.jetty.server.Server@37a8d4d2: java.net.BindException: Address
> already in use
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:444)
> at sun.nio.ch.Net.bind(Net.java:436)
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at
> org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
> at
> org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
> at
> org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
> at org.eclipse.jetty.server.Server.doStart(Server.java:286)
> at
> org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
> at
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:118)
> at
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
> at
> org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:118)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:118)
> at
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:129)
> at org.apache.spark.ui.SparkUI.bind(SparkUI.scala:57)
> at org.apache.spark.SparkContext.<init>(SparkContext.scala:159)
> at
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:47)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:214)
> at
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
> at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:744)
> 14/02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040.
> Trying again.
> 14/02/20 12:59:45 INFO JettyUtils: Error was:
> Failure(java.net.BindException: Address already in use)
>
> 02/20 12:59:45 INFO JettyUtils: Failed to create UI at port, 4040. Trying
> again.
> 14/02/20 12:59:45 INFO JettyUtils: Error was:
> Failure(java.net.BindException: Address already in use)
> 14/02/20 12:59:45 INFO SparkUI: Started Spark Web UI at http://master:4041
> 14/02/20 12:59:45 INFO AppClient$ClientActor: Connecting to master
> spark://master:7070...
> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@master:7070:
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@master:7070]
> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@master:7070:
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@master:7070]
> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@master:7070:
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@master:7070]
> 14/02/20 12:59:46 WARN AppClient$ClientActor: Could not connect to
> akka.tcp://sparkMaster@master:7070:
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@master:7070]
>
>
>
> And after all this thing.. pyspark shell starts..
> And if kill 4040 port,, then things work just fine.
> Can someone help me with this
> Thanks
>
>
> --
> Mohit
>
> "When you want success as badly as you want the air, then you will get it.
> There is no other secret of success."
> -Socrates
>