You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Eunsu Yun (JIRA)" <ji...@apache.org> on 2014/05/29 07:50:01 UTC

[jira] [Created] (SPARK-1960) EOFException when 0 size file exists when use sc.sequenceFile[K,V]("path")

Eunsu Yun created SPARK-1960:
--------------------------------

             Summary: EOFException when 0 size file exists when use sc.sequenceFile[K,V]("path")
                 Key: SPARK-1960
                 URL: https://issues.apache.org/jira/browse/SPARK-1960
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 1.0.0
            Reporter: Eunsu Yun



java.io.EOFException throws when use sc.sequenceFile[K,V] if there is a file which size is 0. 
I also tested sc.textFile() in the same condition and it does not throw EOFException.

val text = sc.sequenceFile[Long, String]("data-gz/*.dat.gz")
val result = text.filter(filterValid)
result.saveAsTextFile("data-out/")


------------------

java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:197)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845)
	at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1759)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1773)
	at org.apache.hadoop.mapred.SequenceFileRecordReader.<init>(SequenceFileRecordReader.java:49)
	at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:64)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
	at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
	at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
	at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:33)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
..............



--
This message was sent by Atlassian JIRA
(v6.2#6252)