You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jeffrey Picard <jp...@columbia.edu> on 2014/09/10 23:00:00 UTC

java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2

Hey guys,

After rebuilding from the master branch this morning, I’ve started to see these errors that I’ve never gotten before while running connected components. Anyone seen this before?

14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 87 spilling in-memory batch of 1020 MB to disk (1 spill so far)
14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 58 spilling in-memory batch of 1020 MB to disk (1 spill so far)
14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 57 spilling in-memory batch of 1020 MB to disk (1 spill so far)
14/09/10 20:38:53 INFO collection.ExternalSorter: Thread 60 spilling in-memory batch of 1020 MB to disk (1 spill so far)
14/09/10 20:39:15 ERROR executor.Executor: Exception in task 275.0 in stage 3.0 (TID 994)
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
        at org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
        at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
        at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:271)
        at org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:249)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:220)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
14/09/10 20:39:15 ERROR executor.Executor: Exception in task 176.0 in stage 3.0 (TID 894)
java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2
        at org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39)
        at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195)
        at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329)
        at org.apache.spark.util.collection.ExternalSorter.spill(ExternalSorter.scala:271)
        at org.apache.spark.util.collection.ExternalSorter.maybeSpill(ExternalSorter.scala:249)
        at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:220)
        at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2

Posted by Ankur Dave <an...@gmail.com>.

I diagnosed this problem today and found that it's because the GraphX custom serializers make an assumption that is violated by sort-based shuffle. I filed SPARK-3649 explaining the problem and submitted a PR to fix it [2].

The fix removes the custom serializers, which has a 10% performance penalty for PageRank since the custom serializers were written specifically to optimize PageRank. Other applications should see much less slowdown.

Ankur

[1] https://issues.apache.org/jira/browse/SPARK-3649
[2] https://github.com/apache/spark/pull/2503

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2

Posted by nit <ni...@gmail.com>.

@ankur - I have also seen this recently. Is there a patch available for this
issue?
(in my recent experience on non-graphx apps, sort based shuffle looks better
while dealing with memory pressure...)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-tp13926p14501.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2

Posted by Ankur Dave <an...@gmail.com>.

On Wed, Sep 10, 2014 at 2:00 PM, Jeffrey Picard <jp...@columbia.edu> wrote:

> After rebuilding from the master branch this morning, I’ve started to see
> these errors that I’ve never gotten before while running connected
> components. Anyone seen this before?
> [...]
>         at
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>

I think GraphX might not handle sort-based shuffle properly, and it was
made the default recently [1]. If that's the problem, a temporary
workaround would be to set spark.shuffle.manager to "hash".

Ankur <http://www.ankurdave.com/>

[1] https://github.com/apache/spark/pull/2178