You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Fernando Pereira (JIRA)" <ji...@apache.org> on 2018/01/15 11:28:02 UTC
[jira] [Comment Edited] (SPARK-21172) EOFException reached end of stream in UnsafeRowSerializer

    [ https://issues.apache.org/jira/browse/SPARK-21172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326122#comment-16326122 ] 

Fernando Pereira edited comment on SPARK-21172 at 1/15/18 11:28 AM:
--------------------------------------------------------------------

With my previous small dataset I was able to make it run by changing the number of partitions (spark.sql.shuffle.partitions) to something more standard.

However, now with a 200GB dataset, there isn't a setting that can make it work. I always get the problem sooner of later, sometimes when more than 1000 partitions have been processed.

I really believe that, by the fact that by we can help it by tuning config values, we  should have some bug in the shuffling read which doesn't handle all corner cases.

Another symptom is this error message occurring in other workers:
{code:java}
java.lang.IndexOutOfBoundsException: len is negative at 
  org.spark_project.guava.io.ByteStreams.read(ByteStreams.java:895) at 
  org.spark_project.guava.io.ByteStreams.readFully(ByteStreams.java:733) at 
  org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127) at 
  org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110) at   scala.collection.Iterator$$anon$12.next(Iterator.scala:444) at 
  scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at 
  org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) at 
  org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) }}
{code}


was (Author: ferdonline):
With my previous small dataset I was able to make it run by changing the number of partitions (spark.sql.shuffle.partitions) to something more standard.

However, now with a 200GB dataset, there isn't a setting that can make it work. I always get the problem sooner of later, sometimes when more than 1000 partitions have been processed.

I really believe that, by the fact that by we can help it by tuning config values, we  should have some bug in the shuffling read which doesn't handle all corner cases.

Another symptom is this error message occurring in other workers:

{code}
java.lang.IndexOutOfBoundsException: len is negative at 
  org.spark_project.guava.io.ByteStreams.read(ByteStreams.java:895) at 
  org.spark_project.guava.io.ByteStreams.readFully(ByteStreams.java:733) at 
  org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127) at 
  org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110) at scala.collection.Iterator$$anon$12.next(Iterator.scala:444) at 
 scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at 
  org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) at 
  org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) }}
{code}

> EOFException reached end of stream in UnsafeRowSerializer
> ---------------------------------------------------------
>
>                 Key: SPARK-21172
>                 URL: https://issues.apache.org/jira/browse/SPARK-21172
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle
>    Affects Versions: 2.0.1
>            Reporter: liupengcheng
>            Priority: Major
>              Labels: shuffle
>
> Spark sql job failed because of the following Exception. Seems like a bug in shuffle stage. 
> Shuffle read size for single task is tens of GB
> {code}
> org.apache.spark.SparkException: Task failed while writing rows
> 	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:264)
> 	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> 	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:86)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.EOFException: reached end of stream after reading 9034374 bytes; 1684891936 bytes expected
> 	at org.spark_project.guava.io.ByteStreams.readFully(ByteStreams.java:735)
> 	at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
> 	at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
> 	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> 	at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
> 	at org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
> 	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
> 	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:255)
> 	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:253)
> 	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:253)
> 	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1345)
> 	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:259)
> 	... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org