You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jonathan Kelly <jo...@gmail.com> on 2016/06/18 01:36:54 UTC

Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
(commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
log4j.properties is not getting picked up in the executor classpath (and
driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file
is taking precedence in the YARN containers.

Spark's log4j.properties file is correctly being bundled into the
__spark_conf__.zip file and getting added to the DistributedCache, but it
is not in the classpath of the executor, as evidenced by the following
command, which I ran in spark-shell:

scala> sc.parallelize(Seq(1)).map(_ =>
getClass().getResource("/log4j.properties")).first
res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties

I then ran the following in spark-shell to verify the classpath of the
executors:

scala> sc.parallelize(Seq(1)).map(_ =>
System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
!e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
...
/mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
/mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
/etc/hadoop/conf
...

So the JVM has this nonexistent __spark_conf__ directory in the classpath
when it should really be __spark_conf__.zip (which is actually a symlink to
a directory, despite the .zip filename).

% sudo ls -l
/mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
total 20
-rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
-rwx------ 1 yarn yarn  594 Jun 18 01:26
default_container_executor_session.sh
-rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
-rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
/mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
/mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp

Does anybody know why this is happening? Is this a bug in Spark, or is it
the JVM doing this (possibly because the extension is .zip)?

Thanks,
Jonathan

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Jonathan Kelly <jo...@gmail.com>.

OK, JIRA created: https://issues.apache.org/jira/browse/SPARK-16080

Also, after looking at the code a bit I think I see the reason. If I'm
correct, it may actually be a very easy fix.

On Mon, Jun 20, 2016 at 1:21 PM Marcelo Vanzin <va...@cloudera.com> wrote:

> It doesn't hurt to have a bug tracking it, in case anyone else has
> time to look at it before I do.
>
> On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jo...@gmail.com>
> wrote:
> > Thanks for the confirmation! Shall I cut a JIRA issue?
> >
> > On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <va...@cloudera.com>
> wrote:
> >>
> >> I just tried this locally and can see the wrong behavior you mention.
> >> I'm running a somewhat old build of 2.0, but I'll take a look.
> >>
> >> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jonathakamzn@gmail.com
> >
> >> wrote:
> >> > Does anybody have any thoughts on this?
> >> >
> >> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <
> jonathakamzn@gmail.com>
> >> > wrote:
> >> >>
> >> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
> >> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> >> >> log4j.properties is
> >> >> not getting picked up in the executor classpath (and driver classpath
> >> >> for
> >> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
> >> >> precedence
> >> >> in the YARN containers.
> >> >>
> >> >> Spark's log4j.properties file is correctly being bundled into the
> >> >> __spark_conf__.zip file and getting added to the DistributedCache,
> but
> >> >> it is
> >> >> not in the classpath of the executor, as evidenced by the following
> >> >> command,
> >> >> which I ran in spark-shell:
> >> >>
> >> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> >> getClass().getResource("/log4j.properties")).first
> >> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
> >> >>
> >> >> I then ran the following in spark-shell to verify the classpath of
> the
> >> >> executors:
> >> >>
> >> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e
> >> >> =>
> >> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
> >> >> ...
> >> >>
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >> >>
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
> >> >> /etc/hadoop/conf
> >> >> ...
> >> >>
> >> >> So the JVM has this nonexistent __spark_conf__ directory in the
> >> >> classpath
> >> >> when it should really be __spark_conf__.zip (which is actually a
> >> >> symlink to
> >> >> a directory, despite the .zip filename).
> >> >>
> >> >> % sudo ls -l
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >> >> total 20
> >> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
> >> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
> >> >> default_container_executor_session.sh
> >> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26
> default_container_executor.sh
> >> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
> >> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
> >> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
> >> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
> >> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
> >> >>
> >> >> Does anybody know why this is happening? Is this a bug in Spark, or
> is
> >> >> it
> >> >> the JVM doing this (possibly because the extension is .zip)?
> >> >>
> >> >> Thanks,
> >> >> Jonathan
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Jonathan Kelly <jo...@gmail.com>.

OK, JIRA created: https://issues.apache.org/jira/browse/SPARK-16080

Also, after looking at the code a bit I think I see the reason. If I'm
correct, it may actually be a very easy fix.

On Mon, Jun 20, 2016 at 1:21 PM Marcelo Vanzin <va...@cloudera.com> wrote:

> It doesn't hurt to have a bug tracking it, in case anyone else has
> time to look at it before I do.
>
> On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jo...@gmail.com>
> wrote:
> > Thanks for the confirmation! Shall I cut a JIRA issue?
> >
> > On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <va...@cloudera.com>
> wrote:
> >>
> >> I just tried this locally and can see the wrong behavior you mention.
> >> I'm running a somewhat old build of 2.0, but I'll take a look.
> >>
> >> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jonathakamzn@gmail.com
> >
> >> wrote:
> >> > Does anybody have any thoughts on this?
> >> >
> >> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <
> jonathakamzn@gmail.com>
> >> > wrote:
> >> >>
> >> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
> >> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> >> >> log4j.properties is
> >> >> not getting picked up in the executor classpath (and driver classpath
> >> >> for
> >> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
> >> >> precedence
> >> >> in the YARN containers.
> >> >>
> >> >> Spark's log4j.properties file is correctly being bundled into the
> >> >> __spark_conf__.zip file and getting added to the DistributedCache,
> but
> >> >> it is
> >> >> not in the classpath of the executor, as evidenced by the following
> >> >> command,
> >> >> which I ran in spark-shell:
> >> >>
> >> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> >> getClass().getResource("/log4j.properties")).first
> >> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
> >> >>
> >> >> I then ran the following in spark-shell to verify the classpath of
> the
> >> >> executors:
> >> >>
> >> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e
> >> >> =>
> >> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
> >> >> ...
> >> >>
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >> >>
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
> >> >> /etc/hadoop/conf
> >> >> ...
> >> >>
> >> >> So the JVM has this nonexistent __spark_conf__ directory in the
> >> >> classpath
> >> >> when it should really be __spark_conf__.zip (which is actually a
> >> >> symlink to
> >> >> a directory, despite the .zip filename).
> >> >>
> >> >> % sudo ls -l
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >> >> total 20
> >> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
> >> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
> >> >> default_container_executor_session.sh
> >> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26
> default_container_executor.sh
> >> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
> >> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
> >> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
> >> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
> >> >>
> >> >>
> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
> >> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
> >> >>
> >> >> Does anybody know why this is happening? Is this a bug in Spark, or
> is
> >> >> it
> >> >> the JVM doing this (possibly because the extension is .zip)?
> >> >>
> >> >> Thanks,
> >> >> Jonathan
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Marcelo Vanzin <va...@cloudera.com>.

It doesn't hurt to have a bug tracking it, in case anyone else has
time to look at it before I do.

On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jo...@gmail.com> wrote:
> Thanks for the confirmation! Shall I cut a JIRA issue?
>
> On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <va...@cloudera.com> wrote:
>>
>> I just tried this locally and can see the wrong behavior you mention.
>> I'm running a somewhat old build of 2.0, but I'll take a look.
>>
>> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jo...@gmail.com>
>> wrote:
>> > Does anybody have any thoughts on this?
>> >
>> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
>> > wrote:
>> >>
>> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
>> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
>> >> log4j.properties is
>> >> not getting picked up in the executor classpath (and driver classpath
>> >> for
>> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
>> >> precedence
>> >> in the YARN containers.
>> >>
>> >> Spark's log4j.properties file is correctly being bundled into the
>> >> __spark_conf__.zip file and getting added to the DistributedCache, but
>> >> it is
>> >> not in the classpath of the executor, as evidenced by the following
>> >> command,
>> >> which I ran in spark-shell:
>> >>
>> >> scala> sc.parallelize(Seq(1)).map(_ =>
>> >> getClass().getResource("/log4j.properties")).first
>> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>> >>
>> >> I then ran the following in spark-shell to verify the classpath of the
>> >> executors:
>> >>
>> >> scala> sc.parallelize(Seq(1)).map(_ =>
>> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e
>> >> =>
>> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
>> >> ...
>> >>
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> >>
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
>> >> /etc/hadoop/conf
>> >> ...
>> >>
>> >> So the JVM has this nonexistent __spark_conf__ directory in the
>> >> classpath
>> >> when it should really be __spark_conf__.zip (which is actually a
>> >> symlink to
>> >> a directory, despite the .zip filename).
>> >>
>> >> % sudo ls -l
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> >> total 20
>> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
>> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
>> >> default_container_executor_session.sh
>> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
>> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
>> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
>> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
>> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
>> >>
>> >> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
>> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>> >>
>> >> Does anybody know why this is happening? Is this a bug in Spark, or is
>> >> it
>> >> the JVM doing this (possibly because the extension is .zip)?
>> >>
>> >> Thanks,
>> >> Jonathan
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Marcelo Vanzin <va...@cloudera.com>.

It doesn't hurt to have a bug tracking it, in case anyone else has
time to look at it before I do.

On Mon, Jun 20, 2016 at 1:20 PM, Jonathan Kelly <jo...@gmail.com> wrote:
> Thanks for the confirmation! Shall I cut a JIRA issue?
>
> On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <va...@cloudera.com> wrote:
>>
>> I just tried this locally and can see the wrong behavior you mention.
>> I'm running a somewhat old build of 2.0, but I'll take a look.
>>
>> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jo...@gmail.com>
>> wrote:
>> > Does anybody have any thoughts on this?
>> >
>> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
>> > wrote:
>> >>
>> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
>> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
>> >> log4j.properties is
>> >> not getting picked up in the executor classpath (and driver classpath
>> >> for
>> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
>> >> precedence
>> >> in the YARN containers.
>> >>
>> >> Spark's log4j.properties file is correctly being bundled into the
>> >> __spark_conf__.zip file and getting added to the DistributedCache, but
>> >> it is
>> >> not in the classpath of the executor, as evidenced by the following
>> >> command,
>> >> which I ran in spark-shell:
>> >>
>> >> scala> sc.parallelize(Seq(1)).map(_ =>
>> >> getClass().getResource("/log4j.properties")).first
>> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>> >>
>> >> I then ran the following in spark-shell to verify the classpath of the
>> >> executors:
>> >>
>> >> scala> sc.parallelize(Seq(1)).map(_ =>
>> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e
>> >> =>
>> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
>> >> ...
>> >>
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> >>
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
>> >> /etc/hadoop/conf
>> >> ...
>> >>
>> >> So the JVM has this nonexistent __spark_conf__ directory in the
>> >> classpath
>> >> when it should really be __spark_conf__.zip (which is actually a
>> >> symlink to
>> >> a directory, despite the .zip filename).
>> >>
>> >> % sudo ls -l
>> >>
>> >> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> >> total 20
>> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
>> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
>> >> default_container_executor_session.sh
>> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
>> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
>> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
>> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
>> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
>> >>
>> >> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
>> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>> >>
>> >> Does anybody know why this is happening? Is this a bug in Spark, or is
>> >> it
>> >> the JVM doing this (possibly because the extension is .zip)?
>> >>
>> >> Thanks,
>> >> Jonathan
>>
>>
>>
>> --
>> Marcelo



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Jonathan Kelly <jo...@gmail.com>.

Thanks for the confirmation! Shall I cut a JIRA issue?

On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <va...@cloudera.com> wrote:

> I just tried this locally and can see the wrong behavior you mention.
> I'm running a somewhat old build of 2.0, but I'll take a look.
>
> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jo...@gmail.com>
> wrote:
> > Does anybody have any thoughts on this?
> >
> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >>
> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> log4j.properties is
> >> not getting picked up in the executor classpath (and driver classpath
> for
> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
> precedence
> >> in the YARN containers.
> >>
> >> Spark's log4j.properties file is correctly being bundled into the
> >> __spark_conf__.zip file and getting added to the DistributedCache, but
> it is
> >> not in the classpath of the executor, as evidenced by the following
> command,
> >> which I ran in spark-shell:
> >>
> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> getClass().getResource("/log4j.properties")).first
> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
> >>
> >> I then ran the following in spark-shell to verify the classpath of the
> >> executors:
> >>
> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
> >> ...
> >>
> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >>
> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
> >> /etc/hadoop/conf
> >> ...
> >>
> >> So the JVM has this nonexistent __spark_conf__ directory in the
> classpath
> >> when it should really be __spark_conf__.zip (which is actually a
> symlink to
> >> a directory, despite the .zip filename).
> >>
> >> % sudo ls -l
> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >> total 20
> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
> >> default_container_executor_session.sh
> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
> >>
> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
> >>
> >> Does anybody know why this is happening? Is this a bug in Spark, or is
> it
> >> the JVM doing this (possibly because the extension is .zip)?
> >>
> >> Thanks,
> >> Jonathan
>
>
>
> --
> Marcelo
>

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Jonathan Kelly <jo...@gmail.com>.

Thanks for the confirmation! Shall I cut a JIRA issue?

On Mon, Jun 20, 2016 at 10:42 AM Marcelo Vanzin <va...@cloudera.com> wrote:

> I just tried this locally and can see the wrong behavior you mention.
> I'm running a somewhat old build of 2.0, but I'll take a look.
>
> On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jo...@gmail.com>
> wrote:
> > Does anybody have any thoughts on this?
> >
> > On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
> > wrote:
> >>
> >> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
> >> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> log4j.properties is
> >> not getting picked up in the executor classpath (and driver classpath
> for
> >> yarn-cluster mode), so Hadoop's log4j.properties file is taking
> precedence
> >> in the YARN containers.
> >>
> >> Spark's log4j.properties file is correctly being bundled into the
> >> __spark_conf__.zip file and getting added to the DistributedCache, but
> it is
> >> not in the classpath of the executor, as evidenced by the following
> command,
> >> which I ran in spark-shell:
> >>
> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> getClass().getResource("/log4j.properties")).first
> >> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
> >>
> >> I then ran the following in spark-shell to verify the classpath of the
> >> executors:
> >>
> >> scala> sc.parallelize(Seq(1)).map(_ =>
> >> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
> >> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
> >> ...
> >>
> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >>
> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
> >> /etc/hadoop/conf
> >> ...
> >>
> >> So the JVM has this nonexistent __spark_conf__ directory in the
> classpath
> >> when it should really be __spark_conf__.zip (which is actually a
> symlink to
> >> a directory, despite the .zip filename).
> >>
> >> % sudo ls -l
> >>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> >> total 20
> >> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
> >> -rwx------ 1 yarn yarn  594 Jun 18 01:26
> >> default_container_executor_session.sh
> >> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
> >> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
> >> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
> >> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
> >> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
> >>
> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
> >> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
> >>
> >> Does anybody know why this is happening? Is this a bug in Spark, or is
> it
> >> the JVM doing this (possibly because the extension is .zip)?
> >>
> >> Thanks,
> >> Jonathan
>
>
>
> --
> Marcelo
>

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Marcelo Vanzin <va...@cloudera.com>.

I just tried this locally and can see the wrong behavior you mention.
I'm running a somewhat old build of 2.0, but I'll take a look.

On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jo...@gmail.com> wrote:
> Does anybody have any thoughts on this?
>
> On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
> wrote:
>>
>> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
>> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's log4j.properties is
>> not getting picked up in the executor classpath (and driver classpath for
>> yarn-cluster mode), so Hadoop's log4j.properties file is taking precedence
>> in the YARN containers.
>>
>> Spark's log4j.properties file is correctly being bundled into the
>> __spark_conf__.zip file and getting added to the DistributedCache, but it is
>> not in the classpath of the executor, as evidenced by the following command,
>> which I ran in spark-shell:
>>
>> scala> sc.parallelize(Seq(1)).map(_ =>
>> getClass().getResource("/log4j.properties")).first
>> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>>
>> I then ran the following in spark-shell to verify the classpath of the
>> executors:
>>
>> scala> sc.parallelize(Seq(1)).map(_ =>
>> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
>> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
>> ...
>>
>> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>>
>> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
>> /etc/hadoop/conf
>> ...
>>
>> So the JVM has this nonexistent __spark_conf__ directory in the classpath
>> when it should really be __spark_conf__.zip (which is actually a symlink to
>> a directory, despite the .zip filename).
>>
>> % sudo ls -l
>> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> total 20
>> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
>> -rwx------ 1 yarn yarn  594 Jun 18 01:26
>> default_container_executor_session.sh
>> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
>> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
>> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
>> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
>> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
>> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
>> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>>
>> Does anybody know why this is happening? Is this a bug in Spark, or is it
>> the JVM doing this (possibly because the extension is .zip)?
>>
>> Thanks,
>> Jonathan



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Marcelo Vanzin <va...@cloudera.com>.

I just tried this locally and can see the wrong behavior you mention.
I'm running a somewhat old build of 2.0, but I'll take a look.

On Mon, Jun 20, 2016 at 7:04 AM, Jonathan Kelly <jo...@gmail.com> wrote:
> Does anybody have any thoughts on this?
>
> On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
> wrote:
>>
>> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT (commit
>> bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's log4j.properties is
>> not getting picked up in the executor classpath (and driver classpath for
>> yarn-cluster mode), so Hadoop's log4j.properties file is taking precedence
>> in the YARN containers.
>>
>> Spark's log4j.properties file is correctly being bundled into the
>> __spark_conf__.zip file and getting added to the DistributedCache, but it is
>> not in the classpath of the executor, as evidenced by the following command,
>> which I ran in spark-shell:
>>
>> scala> sc.parallelize(Seq(1)).map(_ =>
>> getClass().getResource("/log4j.properties")).first
>> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>>
>> I then ran the following in spark-shell to verify the classpath of the
>> executors:
>>
>> scala> sc.parallelize(Seq(1)).map(_ =>
>> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
>> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
>> ...
>>
>> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>>
>> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
>> /etc/hadoop/conf
>> ...
>>
>> So the JVM has this nonexistent __spark_conf__ directory in the classpath
>> when it should really be __spark_conf__.zip (which is actually a symlink to
>> a directory, despite the .zip filename).
>>
>> % sudo ls -l
>> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>> total 20
>> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
>> -rwx------ 1 yarn yarn  594 Jun 18 01:26
>> default_container_executor_session.sh
>> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
>> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
>> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
>> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
>> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
>> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
>> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>>
>> Does anybody know why this is happening? Is this a bug in Spark, or is it
>> the JVM doing this (possibly because the extension is .zip)?
>>
>> Thanks,
>> Jonathan



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Jonathan Kelly <jo...@gmail.com>.

Does anybody have any thoughts on this?
On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
wrote:

> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
> (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> log4j.properties is not getting picked up in the executor classpath (and
> driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file
> is taking precedence in the YARN containers.
>
> Spark's log4j.properties file is correctly being bundled into the
> __spark_conf__.zip file and getting added to the DistributedCache, but it
> is not in the classpath of the executor, as evidenced by the following
> command, which I ran in spark-shell:
>
> scala> sc.parallelize(Seq(1)).map(_ =>
> getClass().getResource("/log4j.properties")).first
> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>
> I then ran the following in spark-shell to verify the classpath of the
> executors:
>
> scala> sc.parallelize(Seq(1)).map(_ =>
> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
> ...
>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
> /etc/hadoop/conf
> ...
>
> So the JVM has this nonexistent __spark_conf__ directory in the classpath
> when it should really be __spark_conf__.zip (which is actually a symlink
> to a directory, despite the .zip filename).
>
> % sudo ls -l
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> total 20
> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
> -rwx------ 1 yarn yarn  594 Jun 18 01:26
> default_container_executor_session.sh
> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>
> Does anybody know why this is happening? Is this a bug in Spark, or is it
> the JVM doing this (possibly because the extension is .zip)?
>
> Thanks,
> Jonathan
>

Re: Spark 2.0 on YARN - Files in config archive not ending up on executor classpath

Posted by Jonathan Kelly <jo...@gmail.com>.

Does anybody have any thoughts on this?
On Fri, Jun 17, 2016 at 6:36 PM Jonathan Kelly <jo...@gmail.com>
wrote:

> I'm trying to debug a problem in Spark 2.0.0-SNAPSHOT
> (commit bdf5fe4143e5a1a393d97d0030e76d35791ee248) where Spark's
> log4j.properties is not getting picked up in the executor classpath (and
> driver classpath for yarn-cluster mode), so Hadoop's log4j.properties file
> is taking precedence in the YARN containers.
>
> Spark's log4j.properties file is correctly being bundled into the
> __spark_conf__.zip file and getting added to the DistributedCache, but it
> is not in the classpath of the executor, as evidenced by the following
> command, which I ran in spark-shell:
>
> scala> sc.parallelize(Seq(1)).map(_ =>
> getClass().getResource("/log4j.properties")).first
> res3: java.net.URL = file:/etc/hadoop/conf.empty/log4j.properties
>
> I then ran the following in spark-shell to verify the classpath of the
> executors:
>
> scala> sc.parallelize(Seq(1)).map(_ =>
> System.getProperty("java.class.path")).flatMap(_.split(':')).filter(e =>
> !e.endsWith(".jar") && !e.endsWith("*")).collect.foreach(println)
> ...
>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
>
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003/__spark_conf__
> /etc/hadoop/conf
> ...
>
> So the JVM has this nonexistent __spark_conf__ directory in the classpath
> when it should really be __spark_conf__.zip (which is actually a symlink
> to a directory, despite the .zip filename).
>
> % sudo ls -l
> /mnt/yarn/usercache/hadoop/appcache/application_1466208403287_0003/container_1466208403287_0003_01_000003
> total 20
> -rw-r--r-- 1 yarn yarn   88 Jun 18 01:26 container_tokens
> -rwx------ 1 yarn yarn  594 Jun 18 01:26
> default_container_executor_session.sh
> -rwx------ 1 yarn yarn  648 Jun 18 01:26 default_container_executor.sh
> -rwx------ 1 yarn yarn 4419 Jun 18 01:26 launch_container.sh
> lrwxrwxrwx 1 yarn yarn   59 Jun 18 01:26 __spark_conf__.zip ->
> /mnt1/yarn/usercache/hadoop/filecache/17/__spark_conf__.zip
> lrwxrwxrwx 1 yarn yarn   77 Jun 18 01:26 __spark_libs__ ->
> /mnt/yarn/usercache/hadoop/filecache/16/__spark_libs__4490748779530764463.zip
> drwx--x--- 2 yarn yarn   46 Jun 18 01:26 tmp
>
> Does anybody know why this is happening? Is this a bug in Spark, or is it
> the JVM doing this (possibly because the extension is .zip)?
>
> Thanks,
> Jonathan
>