You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/11/06 18:38:33 UTC

[jira] [Resolved] (SPARK-798) AMI: ami-530f7a3a and Mesos

     [ https://issues.apache.org/jira/browse/SPARK-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia resolved SPARK-798.
---------------------------------
    Resolution: Won't Fix

This is now about a pretty old AMI, so I'll close it. New versions of Spark use newer versions of Mesos.

> AMI: ami-530f7a3a and Mesos
> ---------------------------
>
>                 Key: SPARK-798
>                 URL: https://issues.apache.org/jira/browse/SPARK-798
>             Project: Spark
>          Issue Type: Bug
>          Components: EC2
>    Affects Versions: 0.7.2
>            Reporter: Alexander Albul
>
> Hi,
> I have some strange problems after new version of AMI out.
> The problem is, that when i create a Mesos cluster, i can't use it.
> I mean, the spark-console is frozen most of the time.
> Even if it is not frozen, the scheduled tasks are frozen.
> Steps to reproduce:
> 1) Start cluster: ./spark-ec2 -s 1 -w 200 -i [identity] -k [key-pair] --cluster-type=mesos launch spark-aalbul
> 2) SSH to the master node
> 3) go to "spark" dir 
> 4) Execute: MASTER=`cat ~/spark-ec2/cluster-url` ./spark-shell
> Problems:
> 1) The most recent problem is that spark shell unable to start like this:
> {code}
> Welcome to
>       ____              __  
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 0.7.2
>       /_/                  
> Using Scala version 2.9.3 (OpenJDK 64-Bit Server VM, Java 1.7.0_25)
> Initializing interpreter...
> 13/07/10 12:14:11 INFO server.Server: jetty-7.6.8.v20121106
> 13/07/10 12:14:11 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:33601
> Creating SparkContext...
> 13/07/10 12:14:21 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
> 13/07/10 12:14:22 INFO spark.SparkEnv: Registering BlockManagerMaster
> 13/07/10 12:14:22 INFO storage.MemoryStore: MemoryStore started with capacity 3.8 GB.
> 13/07/10 12:14:22 INFO storage.DiskStore: Created local directory at /mnt/spark/spark-local-20130710121422-61da
> 13/07/10 12:14:22 INFO storage.DiskStore: Created local directory at /mnt2/spark/spark-local-20130710121422-07d2
> 13/07/10 12:14:22 INFO network.ConnectionManager: Bound socket to port 49473 with id = ConnectionManagerId(ip-10-46-37-82.ec2.internal,49473)
> 13/07/10 12:14:22 INFO storage.BlockManagerMaster: Trying to register BlockManager
> 13/07/10 12:14:22 INFO storage.BlockManagerMaster: Registered BlockManager
> 13/07/10 12:14:22 INFO server.Server: jetty-7.6.8.v20121106
> 13/07/10 12:14:22 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50518
> 13/07/10 12:14:22 INFO broadcast.HttpBroadcast: Broadcast server started at http://10.46.37.82:50518
> 13/07/10 12:14:22 INFO spark.SparkEnv: Registering MapOutputTracker
> 13/07/10 12:14:22 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-6d5266c8-c9f8-4db6-958c-e4791bd8a81d
> 13/07/10 12:14:22 INFO server.Server: jetty-7.6.8.v20121106
> 13/07/10 12:14:22 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:47677
> 13/07/10 12:14:22 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0' started
> 13/07/10 12:14:23 INFO server.HttpServer: akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:35347
> 13/07/10 12:14:23 INFO storage.BlockManagerUI: Started BlockManager web UI at http://ip-10-46-37-82.ec2.internal:35347
> {code}
> When i execute jstack on this process i see that one of the threads is trying to load mesos native library:
> {code}
> "main" prio=10 tid=0x00007fcfc800c000 nid=0x761 runnable [0x00007fcfcefd5000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
> 	at java.lang.ClassLoader.loadLibrary1(ClassLoader.java:1957)
> 	- locked <0x000000077fcb4fb8> (a java.util.Vector)
> 	- locked <0x000000077fd2b348> (a java.util.Vector)
> 	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1882)
> 	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1843)
> 	at java.lang.Runtime.load0(Runtime.java:795)
> 	- locked <0x000000077fcb66d8> (a java.lang.Runtime)
> 	at java.lang.System.load(System.java:1061)
> 	at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:38)
> 	at spark.executor.MesosExecutorBackend$.main(MesosExecutorBackend.scala:73)
> 	at spark.executor.MesosExecutorBackend.main(MesosExecutorBackend.scala)
> {code}
> 2) Scheduled task do not want to finish.
> Even when the console is started (rare case) i see this:
> {code}
> Spark context available as sc.
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> sc.parallelize(List(1,2,3))
> res0: spark.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:13
> scala> res0.collect()
> 13/07/11 09:06:59 INFO spark.SparkContext: Starting job: collect at <console>:15
> 13/07/11 09:06:59 INFO scheduler.DAGScheduler: Got job 0 (collect at <console>:15) with 8 output partitions (allowLocal=false)
> 13/07/11 09:06:59 INFO scheduler.DAGScheduler: Final stage: Stage 0 (parallelize at <console>:13)
> 13/07/11 09:06:59 INFO scheduler.DAGScheduler: Parents of final stage: List()
> 13/07/11 09:06:59 INFO scheduler.DAGScheduler: Missing parents: List()
> 13/07/11 09:06:59 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:13), which has no missing parents
> 13/07/11 09:06:59 INFO scheduler.DAGScheduler: Submitting 8 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at <console>:13)
> 13/07/11 09:06:59 INFO cluster.ClusterScheduler: Adding task set 0.0 with 8 tasks
> 13/07/11 09:06:59 INFO cluster.TaskSetManager: Starting task 0.0:0 as TID 0 on executor 201307110852-2871257866-5050-2301-0: ip-10-170-21-54.ec2.internal (preferred)
> 13/07/11 09:07:00 INFO cluster.TaskSetManager: Serialized task 0.0:0 as 975 bytes in 141 ms
> 13/07/11 09:07:00 INFO cluster.TaskSetManager: Starting task 0.0:1 as TID 1 on executor 201307110852-2871257866-5050-2301-0: ip-10-170-21-54.ec2.internal (preferred)
> 13/07/11 09:07:00 INFO cluster.TaskSetManager: Serialized task 0.0:1 as 975 bytes in 0 ms
> {code}
> Jstack tell me the same thing. It is trying to load Mesos native library and stuck with it.
> BTW:
> We downgraded to the previous version of AMI (unfortunately, i do not remember it's id), updated Java to 1.7 and spark + shark and everything is working like a charm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org