You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by yh18190 <yh...@gmail.com> on 2014/04/02 22:01:49 UTC

Need suggestions

Hi Guys,

Currently I am facing this issue ..Not able to find erros..
here is sbt file.
name := "Simple Project"

version := "1.0"

scalaVersion := "2.10.3"

resolvers += "bintray/meetup" at "http://dl.bintray.com/meetup/maven"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

resolvers += "Cloudera Repository" at
"https://repository.cloudera.com/artifactory/cloudera-repos/"

libraryDependencies += "org.apache.spark" %% "spark-core" %
"0.9.0-incubating"

libraryDependencies += "com.cloudphysics" % "jerkson_2.10" % "0.6.3"

libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
"2.0.0-mr1-cdh4.6.0"

retrieveManaged := true

output..

[error] (run-main) org.apache.spark.SparkException: Job aborted: Task 2.0:2
failed 4 times (most recent failure: Exception failure:
java.lang.NoClassDefFoundError: Could not initialize class
main.scala.Utils$)
org.apache.spark.SparkException: Job aborted: Task 2.0:2 failed 4 times
(most recent failure: Exception failure: java.lang.NoClassDefFoundError:
Could not initialize class main.scala.Utils$)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
        at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
        at scala.Option.foreach(Option.scala:236)
        at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
        at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need suggestions

Posted by andy petrella <an...@gmail.com>.
Sorry I was not clear perhaps, anyway, could you try with the path in the
*List* to be the absolute one; e.g.
List("/home/yh/src/pj/spark-stuffs/target/scala-2.10/simple-project_2.10-1.0.jar")

In order to provide a relative path, you need first to figure out your CWD,
so you can do (to be really sure) do:
//before the new SparkContecxt)
println(new java.io.File(".").toUri.toString) // didn't try so adapt to
make scalax happy ^^




On Wed, Apr 2, 2014 at 11:30 PM, yh18190 <yh...@gmail.com> wrote:

> Hi,
> Here is the sparkcontext feature.Do I need to any more extra jars to slaves
> separetely or this is enough?
> But i am able to see this created jar in my target directory..
>
>  val sc = new SparkContext("spark://spark-master-001:7077", "Simple App",
> utilclass.spark_home,
>               List("target/scala-2.10/simple-project_2.10-1.0.jar"))
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3655.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Need suggestions

Posted by yh18190 <yh...@gmail.com>.
Hi,
Here is the sparkcontext feature.Do I need to any more extra jars to slaves
separetely or this is enough?
But i am able to see this created jar in my target directory..

 val sc = new SparkContext("spark://spark-master-001:7077", "Simple App",
utilclass.spark_home,
              List("target/scala-2.10/simple-project_2.10-1.0.jar"))




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3655.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need suggestions

Posted by andy petrella <an...@gmail.com>.
I cannot access your repo, however my gut feeling is that
*"target/scala-2.10/simple-project_2.10-1.0.jar"* is not enough (say, that
your cwd is not the folder containing *target*). I'd say that it'd be
easier to put an absolute path...

My2c

   -



On Wed, Apr 2, 2014 at 11:07 PM, yh18190 <yh...@gmail.com> wrote:

> Its working under local mode..but not under cluster mode with 4 slaves....
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3653.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Need suggestions

Posted by yh18190 <yh...@gmail.com>.
Its working under local mode..but not under cluster mode with 4 slaves....



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3653.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need suggestions

Posted by yh18190 <yh...@gmail.com>.
Hi,
Thanks for response.Could you please look into my repo..Here Utils class is
the class.I cannot paste the entire code..Thaswhy..

I have other class from where I would be calling Utils class for object
creation..

package main.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import com.codahale.jerkson.Json._
import scala.collection.JavaConversions._
import scala.collection.immutable._
import scala.io.Source
import rtree._

object Utils {
    //Main project directory
  	val work_directory = "/Users/Meghana/Documents/workspace/assignment"
    //Location of spark home 
  	val spark_home = "/Users/Meghana/Downloads/spark-0.9.0-incubating"
  	  
  	//Location of Twitter data
	val data_home = "/Users/Meghana/Desktop/Twitter/Europe/2012/3/1"
	//CSV file that has the 
	val bbox_file = work_directory + "/cities_eu.csv"
	  
	//Locations to store the intermediate data
	val intermediate_ucg_data = work_directory + "/ucg_int"
	val intermediate_rt_data = work_directory + "/rtc_int"
	val intermediate_wc_data = work_directory + "/wc_int"
	val intermediate_ucc_data = work_directory + "/ucc_int"
	
	//Create spark context
    val sc = new SparkContext("local", "Simple App", Utils.spark_home,
	      List("target/scala-2.10/simple-project_2.10-1.0.jar"))
  	
  	//RTree structure with key as the city name. 
  	//First initialize with an empty tree.
  	var rtree:RTree[String] = RTree.empty
  	
    //default window size
  	var jumping_window_size:Integer = 1;
  	var sliding_window_size:Integer = 1;

  	
    //We start from hour 1
	val initialHour:Integer = 1;
	
	//We calculate with this frequency (eg; every one hour)
	val calcFreq:Integer = 1;
	
    //Object representation of Tweet
  	//text -> Tweet text
  	//retweets -> number of re-tweets of the current tweet.
  	//country -> contry where the tweet appeared
  	//city -> from which city the tweet has appeared
  	//hour -> hour of the tweet.
  	class Tweet(val user:String, 
  	    val text: String, 
  	    val retweets:Integer, 
  	    val country:String, 
  	    val city:String,
  	    val hour:Int) {
		override def toString: String =
				"User: " + user + "\n" +
				"Text: " + text + "\n" + 
				"Retweets: " + retweets + "\n" +
				"Country: " + country + "\n" + 
				"City: " + city + "\n" +
				"hour: " + hour + "\n"
	} 

    //Function to parse a line of string to json object, and then create a
Tweet instance
	def parseTweet(s: String): Tweet = {
	    //Parse the given json line from the twitter dataset to the key:value
map.
		val tweet_details_map = parse[Map[String, Any]](s)
		//Extract tweet string from the given line
		val text:String = tweet_details_map.get("text").get.asInstanceOf[String]
		//Extract the user data from the tweet line.
		val user_details =
tweet_details_map.get("user").get.asInstanceOf[java.util.LinkedHashMap[String,Any]]
		//Extract the retweet count from the given tweet line.
		val retweets:Integer =
tweet_details_map.get("retweet_count").get.asInstanceOf[Integer]

https://bitbucket.org/smartmetersproject/twitterdatasets1/src/c379405f1437a9eb4fc7fa0f3f9a2834e766ad2d/src/main/scala/Utils.scala?at=master



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650p3652.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Need suggestions

Posted by andy petrella <an...@gmail.com>.
TL;DR
Your classes are missing on the workers, pass the jar containing the
"class" main.scala.Utils to the SparkContext

Longer:
I miss some information, like how the SparkContext is configured but my
best guess is that you didn't provided the jars (addJars on SparkConf or
use the SC's constructor param).

Actually, the classes are not found on the slave which is another node,
another machine (or env), and so on. So to run your class it must be able
to load it -- which is handle by Spark but you simply have to pass it as an
argument...

This jar could be simply your current project packaged as a jar using
maven/sbt/...

HTH


On Wed, Apr 2, 2014 at 10:01 PM, yh18190 <yh...@gmail.com> wrote:

> Hi Guys,
>
> Currently I am facing this issue ..Not able to find erros..
> here is sbt file.
> name := "Simple Project"
>
> version := "1.0"
>
> scalaVersion := "2.10.3"
>
> resolvers += "bintray/meetup" at "http://dl.bintray.com/meetup/maven"
>
> resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
>
> resolvers += "Cloudera Repository" at
> "https://repository.cloudera.com/artifactory/cloudera-repos/"
>
> libraryDependencies += "org.apache.spark" %% "spark-core" %
> "0.9.0-incubating"
>
> libraryDependencies += "com.cloudphysics" % "jerkson_2.10" % "0.6.3"
>
> libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
> "2.0.0-mr1-cdh4.6.0"
>
> retrieveManaged := true
>
> output..
>
> [error] (run-main) org.apache.spark.SparkException: Job aborted: Task 2.0:2
> failed 4 times (most recent failure: Exception failure:
> java.lang.NoClassDefFoundError: Could not initialize class
> main.scala.Utils$)
> org.apache.spark.SparkException: Job aborted: Task 2.0:2 failed 4 times
> (most recent failure: Exception failure: java.lang.NoClassDefFoundError:
> Could not initialize class main.scala.Utils$)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
>         at
>
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
>         at scala.Option.foreach(Option.scala:236)
>         at
>
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
>         at
>
> org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at
>
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Need-suggestions-tp3650.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>