You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Gianluca Privitera <gi...@studio.unibo.it> on 2014/07/15 11:17:13 UTC

Spark Streaming w/ tshark exception problem on EC2

Hi,
I’ve got a problem with Spark Streaming and tshark.
While I’m running locally I have no problems with this code, but when I run it on a EC2 cluster I get the exception shown just under the code.

def dissection(s: String): Seq[String] = {
    try {

      Process("hadoop command to create ./localcopy.tmp").! // calls hadoop to copy a file from s3 locally
      val pb = Process(“tshark … localcopy.tmp”)  // calls tshark to transform the s3 file into sequence of strings
      var returnValue = pb.lines_!.toSeq
      return returnValue

    } catch {
      case e: Exception =>
        System.err.println(“ERROR")
        return new MutableList[String]()
    }
  }

(line 2051 points to the function “dissection”)

WARN scheduler.TaskSetManager: Loss was due to java.lang.ExceptionInInitializerError
java.lang.ExceptionInInitializerError
at Main$$anonfun$11.apply(Main.scala:2051)
at Main$$anonfun$11.apply(Main.scala:2051)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Has anyone got an idea why that may happen? I’m pretty sure that the hadoop call works perfectly.

Thanks
Gianluca

Re: Spark Streaming w/ tshark exception problem on EC2

Posted by Tathagata Das <ta...@gmail.com>.

Quick google search of that exception says this occurs when there is an
error in the initialization of static methods. Could be some issue related
to how dissection is defined. Maybe try putting the function in a different
static class that is unrelated to the Main class, which may have other
static initialization stuff? Its hard to say much more without any more
context.

TD


On Tue, Jul 15, 2014 at 2:17 AM, Gianluca Privitera <
gianluca.privitera2@studio.unibo.it> wrote:

>  Hi,
> I’ve got a problem with Spark Streaming and tshark.
> While I’m running locally I have no problems with this code, but when I
> run it on a EC2 cluster I get the exception shown just under the code.
>
>  def dissection(s: String): Seq[String] = {
>     try {
>
>        Process("hadoop command to create ./localcopy.tmp").! // calls
> hadoop to copy a file from s3 locally
>        val pb = Process(“tshark … localcopy.tmp”)  // calls tshark to
> transform the s3 file into sequence of strings
>       var returnValue = pb.lines_!.toSeq
>        return returnValue
>
>      } catch {
>       case e: Exception =>
>          System.err.println(“ERROR")
>         return new MutableList[String]()
>     }
>   }
>
>  (line 2051 points to the function “dissection”)
>
>  WARN scheduler.TaskSetManager: Loss was due to
> java.lang.ExceptionInInitializerError
> java.lang.ExceptionInInitializerError
> at Main$$anonfun$11.apply(Main.scala:2051)
> at Main$$anonfun$11.apply(Main.scala:2051)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
> at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
> at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
> at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>  Has anyone got an idea why that may happen? I’m pretty sure that the
> hadoop call works perfectly.
>
>  Thanks
> Gianluca
>