You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Luis Ángel Vicente Sánchez <la...@gmail.com> on 2015/07/13 18:15:25 UTC

Problems after upgrading to spark 1.4.0

I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
and after deploying it to mesos, it's not working anymore.

The upgrade process was quite easy:

- Create a new docker container for spark 1.4.0.
- Upgrade spark job to use spark 1.4.0 as a dependency and create a new
fatjar.
- Create a docker container for the jobs,  based on previous spark 1.4.0
container.

After deploying it to marathon, the job only displays the driver under
executors and no task progresses. I haven't made any change to my config
files (apart for updating spark.executors.uri to point to the right file on
s3).

If I go to mesos and I check my job under frameworks, I can see a few
failed stages; the content of stderr looks always like this:

I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI
'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI
'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
with os::net
I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading
'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
to '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource
'/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
into '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave
20150713-133618-421011372-5050-8867-S5
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable
15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
15/07/13 15:59:52 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(root); users with modify permissions: Set(root)
15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
15/07/13 15:59:52 INFO Remoting: Starting remoting
15/07/13 15:59:53 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://driverPropsFetcher@int-mesos-slave-ib4583253.mclabs.io:41854]
15/07/13 15:59:53 INFO Utils: Successfully started service
'driverPropsFetcher' on port 41854.
15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
15/07/13 15:59:53 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(root); users with modify permissions: Set(root)
15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator:
Shutting down remote daemon.
15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator:
Remote daemon shut down; proceeding with flushing remote transports.
15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
15/07/13 15:59:53 INFO Remoting: Starting remoting
15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator:
Remoting shut down.
15/07/13 15:59:53 INFO Utils: Successfully started service
'sparkExecutor' on port 60219.
15/07/13 15:59:53 INFO Remoting: Remoting started; listening on
addresses :[akka.tcp://sparkExecutor@int-mesos-slave-ib4583253.mclabs.io:60219]
15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at
/var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
Exception in thread "main" java.io.FileNotFoundException:
/etc/mindcandy/metrics.properties (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at java.io.FileInputStream.<init>(FileInputStream.java:93)
	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
	at scala.Option.map(Option.scala:145)
	at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
	at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
	at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
15/07/13 15:59:53 INFO Utils: path =
/var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87,
already present as root for deletion.
15/07/13 15:59:53 INFO Utils: Shutdown hook called
15/07/13 15:59:53 INFO Utils: Deleting directory
/var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439

Re: Problems after upgrading to spark 1.4.0

Posted by Luis Ángel Vicente Sánchez <la...@gmail.com>.

I have just restarted the job and it doesn't seem that the shutdown hook is
executed. I have attached to this email the log from the driver. It seems
that the slave are not accepting the tasks... but we haven't change
anything on our mesos cluster, we have only upgrade one job to spark 1.4;
is there any config option that had been added and it's mandatory?

2015-07-13 22:12 GMT+01:00 Tathagata Das <td...@databricks.com>:

> Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the
> Sparkcontext in the driver, which would shutdown the executors. I am not
> sure whether this is related or not, but somehow the executor's shutdown
> hook is being called.
> Can you check the driver logs to see if driver's shutdown hook is
> accidentally being called?
>
>
> On Mon, Jul 13, 2015 at 9:23 AM, Luis Ángel Vicente Sánchez <
> langel.groups@gmail.com> wrote:
>
>> I forgot to mention that this is a long running job, actually a spark
>> streaming job, and it's using mesos coarse mode. I'm still using the
>> unreliable kafka receiver.
>>
>> 2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez <
>> langel.groups@gmail.com>:
>>
>>> I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
>>> and after deploying it to mesos, it's not working anymore.
>>>
>>> The upgrade process was quite easy:
>>>
>>> - Create a new docker container for spark 1.4.0.
>>> - Upgrade spark job to use spark 1.4.0 as a dependency and create a new
>>> fatjar.
>>> - Create a docker container for the jobs,  based on previous spark 1.4.0
>>> container.
>>>
>>> After deploying it to marathon, the job only displays the driver under
>>> executors and no task progresses. I haven't made any change to my config
>>> files (apart for updating spark.executors.uri to point to the right file on
>>> s3).
>>>
>>> If I go to mesos and I check my job under frameworks, I can see a few
>>> failed stages; the content of stderr looks always like this:
>>>
>>> I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
>>> I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' with os::net
>>> I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' to '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
>>> I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz' into '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
>>> I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
>>> I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave 20150713-133618-421011372-5050-8867-S5
>>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>>> 15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
>>> 15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>> 15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
>>> 15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
>>> 15/07/13 15:59:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
>>> 15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
>>> 15/07/13 15:59:52 INFO Remoting: Starting remoting
>>> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@int-mesos-slave-ib4583253.mclabs.io:41854]
>>> 15/07/13 15:59:53 INFO Utils: Successfully started service 'driverPropsFetcher' on port 41854.
>>> 15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
>>> 15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
>>> 15/07/13 15:59:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
>>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
>>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
>>> 15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
>>> 15/07/13 15:59:53 INFO Remoting: Starting remoting
>>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
>>> 15/07/13 15:59:53 INFO Utils: Successfully started service 'sparkExecutor' on port 60219.
>>> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@int-mesos-slave-ib4583253.mclabs.io:60219]
>>> 15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
>>> 15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
>>> Exception in thread "main" java.io.FileNotFoundException: /etc/mindcandy/metrics.properties (No such file or directory)
>>> 	at java.io.FileInputStream.open0(Native Method)
>>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>>> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>>> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>>> 	at scala.Option.map(Option.scala:145)
>>> 	at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
>>> 	at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
>>> 	at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
>>> 	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
>>> 	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
>>> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
>>> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:422)
>>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
>>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
>>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
>>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>>> 15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
>>> 15/07/13 15:59:53 INFO Utils: path = /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87, already present as root for deletion.
>>> 15/07/13 15:59:53 INFO Utils: Shutdown hook called
>>> 15/07/13 15:59:53 INFO Utils: Deleting directory /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439
>>>
>>>
>>>
>>>
>>
>

Re: Problems after upgrading to spark 1.4.0

Posted by Tathagata Das <td...@databricks.com>.

Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the
Sparkcontext in the driver, which would shutdown the executors. I am not
sure whether this is related or not, but somehow the executor's shutdown
hook is being called.
Can you check the driver logs to see if driver's shutdown hook is
accidentally being called?


On Mon, Jul 13, 2015 at 9:23 AM, Luis Ángel Vicente Sánchez <
langel.groups@gmail.com> wrote:

> I forgot to mention that this is a long running job, actually a spark
> streaming job, and it's using mesos coarse mode. I'm still using the
> unreliable kafka receiver.
>
> 2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez <
> langel.groups@gmail.com>:
>
>> I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
>> and after deploying it to mesos, it's not working anymore.
>>
>> The upgrade process was quite easy:
>>
>> - Create a new docker container for spark 1.4.0.
>> - Upgrade spark job to use spark 1.4.0 as a dependency and create a new
>> fatjar.
>> - Create a docker container for the jobs,  based on previous spark 1.4.0
>> container.
>>
>> After deploying it to marathon, the job only displays the driver under
>> executors and no task progresses. I haven't made any change to my config
>> files (apart for updating spark.executors.uri to point to the right file on
>> s3).
>>
>> If I go to mesos and I check my job under frameworks, I can see a few
>> failed stages; the content of stderr looks always like this:
>>
>> I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
>> I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' with os::net
>> I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' to '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
>> I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz' into '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
>> I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
>> I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave 20150713-133618-421011372-5050-8867-S5
>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>> 15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
>> 15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> 15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
>> 15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
>> 15/07/13 15:59:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
>> 15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
>> 15/07/13 15:59:52 INFO Remoting: Starting remoting
>> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@int-mesos-slave-ib4583253.mclabs.io:41854]
>> 15/07/13 15:59:53 INFO Utils: Successfully started service 'driverPropsFetcher' on port 41854.
>> 15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
>> 15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
>> 15/07/13 15:59:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
>> 15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
>> 15/07/13 15:59:53 INFO Remoting: Starting remoting
>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
>> 15/07/13 15:59:53 INFO Utils: Successfully started service 'sparkExecutor' on port 60219.
>> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@int-mesos-slave-ib4583253.mclabs.io:60219]
>> 15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
>> 15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
>> Exception in thread "main" java.io.FileNotFoundException: /etc/mindcandy/metrics.properties (No such file or directory)
>> 	at java.io.FileInputStream.open0(Native Method)
>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>> 	at scala.Option.map(Option.scala:145)
>> 	at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
>> 	at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
>> 	at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
>> 	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
>> 	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
>> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
>> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:422)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>> 15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
>> 15/07/13 15:59:53 INFO Utils: path = /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87, already present as root for deletion.
>> 15/07/13 15:59:53 INFO Utils: Shutdown hook called
>> 15/07/13 15:59:53 INFO Utils: Deleting directory /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439
>>
>>
>>
>>
>

Re: Problems after upgrading to spark 1.4.0

Posted by Luis Ángel Vicente Sánchez <la...@gmail.com>.

I forgot to mention that this is a long running job, actually a spark
streaming job, and it's using mesos coarse mode. I'm still using the
unreliable kafka receiver.

2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.groups@gmail.com>:

> I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
> and after deploying it to mesos, it's not working anymore.
>
> The upgrade process was quite easy:
>
> - Create a new docker container for spark 1.4.0.
> - Upgrade spark job to use spark 1.4.0 as a dependency and create a new
> fatjar.
> - Create a docker container for the jobs,  based on previous spark 1.4.0
> container.
>
> After deploying it to marathon, the job only displays the driver under
> executors and no task progresses. I haven't made any change to my config
> files (apart for updating spark.executors.uri to point to the right file on
> s3).
>
> If I go to mesos and I check my job under frameworks, I can see a few
> failed stages; the content of stderr looks always like this:
>
> I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
> I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' with os::net
> I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz' to '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
> I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz' into '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
> I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
> I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave 20150713-133618-421011372-5050-8867-S5
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> 15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT]
> 15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
> 15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
> 15/07/13 15:59:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
> 15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
> 15/07/13 15:59:52 INFO Remoting: Starting remoting
> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@int-mesos-slave-ib4583253.mclabs.io:41854]
> 15/07/13 15:59:53 INFO Utils: Successfully started service 'driverPropsFetcher' on port 41854.
> 15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
> 15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
> 15/07/13 15:59:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
> 15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
> 15/07/13 15:59:53 INFO Remoting: Starting remoting
> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
> 15/07/13 15:59:53 INFO Utils: Successfully started service 'sparkExecutor' on port 60219.
> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@int-mesos-slave-ib4583253.mclabs.io:60219]
> 15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
> 15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
> Exception in thread "main" java.io.FileNotFoundException: /etc/mindcandy/metrics.properties (No such file or directory)
> 	at java.io.FileInputStream.open0(Native Method)
> 	at java.io.FileInputStream.open(FileInputStream.java:195)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
> 	at scala.Option.map(Option.scala:145)
> 	at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
> 	at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
> 	at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
> 	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
> 	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> 15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
> 15/07/13 15:59:53 INFO Utils: path = /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87, already present as root for deletion.
> 15/07/13 15:59:53 INFO Utils: Shutdown hook called
> 15/07/13 15:59:53 INFO Utils: Deleting directory /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439
>
>
>
>