You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by julyfire <he...@gmail.com> on 2014/09/25 04:51:52 UTC

Re: How to use FlumeInputDStream in spark cluster?

I have test the example codes FlumeEventCount on standalone cluster, and this
is still a problem in Spark 1.1.0, the latest version up to now. Do you have
solved this issue in your way?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p15102.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to use FlumeInputDStream in spark cluster?

Posted by Ping Tang <pt...@aerohive.com>.

Thank you very much for your reply.

I have a cluster of 8 nodes: m1, m2, m3.. m8. m1 configured as Spark master node, the rest of the nodes are all worker node. I also configured m3 as the History Server. But the history server fails to start.I ran FlumeEventCount in m1 using the right hostname and a port that is not used by any application. Here is the script I used to run FlumeEventCount:


#!/bin/bash


spark-submit --verbose --class org.apache.spark.examples.streaming.FlumeEventCount --deploy-mode client --master yarn-client --jars lib/spark-streaming-flume_2.10-1.1.0-cdh5.2.2-20141112.193826-1.jar /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/examples/lib/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar m1.ptang.aerohive.com 1234


Same issue observed after added  "spark.ui.port=4321" in  /etc/spark/conf/spark-defaults.conf. Followings are the exceptions from the job run:


14/12/01 11:51:45 INFO JobScheduler: Finished job streaming job 1417463504000 ms.0 from job set of time 1417463504000 ms

14/12/01 11:51:45 INFO JobScheduler: Total delay: 1.465 s for time 1417463504000 ms (execution: 1.415 s)

14/12/01 11:51:45 ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - org.jboss.netty.channel.ChannelException: Failed to bind to: m1.ptang.aerohive.com/192.168.10.22:1234

at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)

at org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:164)

at org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:171)

at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

at org.apache.spark.scheduler.Task.run(Task.scala:54)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.BindException: Cannot assign requested address

at sun.nio.ch.Net.bind0(Native Method)

at sun.nio.ch.Net.bind(Net.java:444)

at sun.nio.ch.Net.bind(Net.java:436)

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)

at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)

... 3 more


14/12/01 11:51:45 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 72, m8.ptang.aerohive.com): org.jboss.netty.channel.ChannelException: Failed to bind to: m1.ptang.aerohive.com/192.168.10.22:1234

        org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)

        org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:164)

        org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:171)

        org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

        org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

        org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

        org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

        org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

        org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

        org.apache.spark.scheduler.Task.run(Task.scala:54)

        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)

        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        java.lang.Thread.run(Thread.java:745)

Thanks

Ping

From: Prannoy <pr...@sigmoidanalytics.com>>
Date: Friday, November 28, 2014 at 12:56 AM
To: "user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>" <us...@spark.incubator.apache.org>>
Subject: Re: How to use FlumeInputDStream in spark cluster?

Hi,

BindException comes when two processes are using the same port. In your spark configuration just set ("spark.ui.port","xxxxx"),
to some other port. xxxxx can be any number say 12345. BindException will not break your job in either case. Just to fix it change the port number.

Thanks.

On Fri, Nov 28, 2014 at 1:30 PM, pamtang [via Apache Spark User List] <[hidden email]</user/SendEmail.jtp?type=node&node=19999&i=0>> wrote:
I'm seeing the same issue on CDH 5.2 with Spark 1.1. FlumeEventCount works fine on a Standalone cluster but throw BindException on YARN mode. Is there a solution to this problem or FlumeInputDStream will not be working in a cluster environment?

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19997.html
To start a new topic under Apache Spark User List, email [hidden email]</user/SendEmail.jtp?type=node&node=19999&i=1>
To unsubscribe from Apache Spark User List, click here.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


________________________________
View this message in context: Re: How to use FlumeInputDStream in spark cluster?<http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19999.html>
Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.

Re: How to use FlumeInputDStream in spark cluster?

Posted by Prannoy <pr...@sigmoidanalytics.com>.

Hi,

BindException comes when two processes are using the same port. In your
spark configuration just set ("spark.ui.port","xxxxx"),
to some other port. xxxxx can be any number say 12345. BindException will
not break your job in either case. Just to fix it change the port number.

Thanks.

On Fri, Nov 28, 2014 at 1:30 PM, pamtang [via Apache Spark User List] <
ml-node+s1001560n19997h9@n3.nabble.com> wrote:

> I'm seeing the same issue on CDH 5.2 with Spark 1.1. FlumeEventCount works
> fine on a Standalone cluster but throw BindException on YARN mode. Is there
> a solution to this problem or FlumeInputDStream will not be working in a
> cluster environment?
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19997.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h33@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=cHJhbm5veUBzaWdtb2lkYW5hbHl0aWNzLmNvbXwxfC0xNTI2NTg4NjQ2>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19999.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to use FlumeInputDStream in spark cluster?

Posted by BigDataUser <su...@yahoo.com>.

I am running FlumeEventCount program in CDH 5.0.1 which has Spark 0.9.0. The
program runs fine in local process as well as standalone cluster mode.
However, the program fails in YARN mode. I see the following error:
INFO scheduler.DAGScheduler: Stage 2 (runJob at
NetworkInputTracker.scala:182) finished in 0.215 s
INFO spark.SparkContext: Job finished: runJob at
NetworkInputTracker.scala:182, took 0.224696381 s
ERROR scheduler.NetworkInputTracker: De-registered receiver for network
stream 0 with message org.jboss.netty.channel.ChannelException: Failed to
bind to: xxxxx/xx.xxx.x.xx:41415
Is there a workaround for this ?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p17226.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org