You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hemanth Yamijala <yh...@gmail.com> on 2014/09/04 12:26:21 UTC

Spark processes not doing on killing corresponding YARN application

Hi,

I launched a spark streaming job under YARN using default configuration for
Spark, using spark-submit with the master as yarn-cluster. It launched an
ApplicationMaster, and 2 CoarseGrainedExecutorBackend processes.

Everything ran fine, then I killed the application using yarn application
-kill <appid>.

On doing this, I noticed that it killed only the shell processes that
launch the Spark AM and other processes, but the Java processes were left
alone. They became orphaned and PPID changed to 1.

Is this a bug in Spark or Yarn ? I am using spark 1.0.2 and Hadoop 2.4.1.
The cluster is a single node setup in pseudo-distributed mode.

Thanks
hemanth

Re: Spark processes not doing on killing corresponding YARN application

Posted by didata <su...@didata.us>.

I figured out this issue (in our case) ...And I'll vent a little in my reply
here... =:)Fedora's well-intentioned firewall (firewall-cmd) requires you to
open (enable) any port/services on a host that you need to connect to
(including SSH/22 - which is enabled by default, of course). So when
launching client applications that use ephemeral ports to connect back to
(as a Spark App does for remote YARN ResourceManager/NodeManagers to connect
back to), you can't know what that port will be to enable it, unless the
application allows you to specify that as a launch property (which you can
for Spark Apps via -- -Dspark.driver.port="NNNNN").Again, well intentioned,
but always a pain.So... you have to either disable the firewall capability
in Fedora; or you open/enable a range of ports and tell your applications to
use one of those.Also note that as of this writing, firewall-cmd's ability
to port-forwarding from the HOST to GUESTS in Libvirt/KVM-based
Hadoop/YARN/HDFS test/dev clusters, doesn't work (it never has -- it's on
the TODO list). It's another capability that you'll need in order to reach
daemon ports running *inside* the KVM cluster (for example, UI ports). The
work-around here (besides, again, disabling the Fedora Firewall altogether)
is to use same-subnet BRIDGING (not NAT-ting). Doing that will eliminate the
need for port-forawrding (which again doesn't work). I've filed bugs in the
past for this.So that is why YARN applications weren't terminating correctly
for Spark Aps, or for that matter working at all since it uses ephemeral
ports (by necessity).So whatever the port your Spark application uses,
remember to issue the command:use@driverHost$ sudo firewall-cmd
--zone=public --add-port=/SparkAppPort//tcpor, better yet, use a
port-deterministic strategy mentioned earlier.(Hopefully the verbosity here
will help someone in their furute search. Fedora aside, the original problem
here can be network related, as I discovered).sincerely,didata



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-processes-not-doing-on-killing-corresponding-YARN-application-tp13443p13819.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark processes not doing on killing corresponding YARN application

Posted by didata <su...@didata.us>.

Thanks for asking this.

I've have this issue with pyspark too on YARN 100 of the time: I quit out 
of pyspark and, while my Unix shell prompt returns, a 'yarn application 
-list' always shows (as does the UI) that application is still running (or 
at least not totally dead). When I then log onto the nodemanagers, I see 
orphaned/defunct UNIX processes.

I don't know if those are caused by the simple exiting of pyspark, or my 
having to do a 'yarn application -kill <appID>' to kill applications that 
should should have terminated graceful (but didn't).

It recognized this problem about a week ago and haven't gotten back to it 
(it looks like today I will =:)) , but yes I see that same issue.

I am using Spark 1.0.0 (latest from Cloudera), and the latest YARN from 
Cloudera as well (I forget the exact version at the moment).

Sincerely yours,
Team Dimension Data

On September 4, 2014 6:27:04 AM Hemanth Yamijala <yh...@gmail.com> wrote:

> Hi,
>
> I launched a spark streaming job under YARN using default configuration for
> Spark, using spark-submit with the master as yarn-cluster. It launched an
> ApplicationMaster, and 2 CoarseGrainedExecutorBackend processes.
>
> Everything ran fine, then I killed the application using yarn application
> -kill <appid>.
>
> On doing this, I noticed that it killed only the shell processes that
> launch the Spark AM and other processes, but the Java processes were left
> alone. They became orphaned and PPID changed to 1.
>
> Is this a bug in Spark or Yarn ? I am using spark 1.0.2 and Hadoop 2.4.1.
> The cluster is a single node setup in pseudo-distributed mode.
>
> Thanks
> hemanth