You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Evan Chan <ev...@ooyala.com> on 2013/10/30 17:40:05 UTC

Getting failures in FileServerSuite

I'm at the latest

commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
Merge: aec9bf9 a197137
Author: Reynold Xin <rx...@apache.org>
Date:   Tue Oct 29 01:41:44 2013 -0400


and seeing this when I do a "test-only FileServerSuite":

13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
java.io.StreamCorruptedException
java.io.StreamCorruptedException: invalid type code: AC
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
        at org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
        at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
        at org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
        at org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
        at org.apache.spark.scheduler.Task.run(Task.scala:53)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        at java.lang.Thread.run(Thread.java:680)


Anybody else seen this yet?

I have a really simple PR and this fails without my change, so I may
go ahead and submit it anyways.

-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Re: Getting failures in FileServerSuite

Posted by Evan Chan <ev...@ooyala.com>.

Oops, hit enter too soon.

Mark: 1.6.0_51.   I'm hesitant to upgrade to JDK 1.7, as several folks
reported problems on OSX.   Also, I have no problems building the
assembly, and it only takes me about 2 minutes (I'm running on SSD's
though  :)

I might bite the bullet and upgrade to JDK 1.7 for other reasons though.

Patrick:  I know the branch with my config overhaul, last merged from
master Oct-10th, doesn't exhibit this problem.  (Note that I tend to
set SPARK_LOCAL_IP to "localhost", which I don't think affects this,
and it fails either with it set or not, i believe)  We could
potentially run a git bisect starting roughly 2-3 weeks ago.



On Sun, Nov 3, 2013 at 3:55 PM, Evan Chan <ev...@ooyala.com> wrote:
> Mark:  I'm using JDK 1.6. --
>
> On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
>> What JDK version on you using, Evan?
>>
>> I tried to reproduce your problem earlier today, but I wasn't even able to
>> get through the assembly build -- kept hanging when trying to build the
>> examples assembly.  Foregoing the assembly and running the tests would hang
>> on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
>> just hung.  And I was actually seeing a very similar stack trace to yours
>> from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
>> the same because line numbers were different once it went into the java
>> runtime, and it eventually ended up someplace a little different.  That got
>> me curious about differences in Java versions, so I updated to the latest
>> Oracle release (1.7.0_45).  Now it cruises right through the build and test
>> of Spark master from before Matei merged your PR.  Then I logged into a
>> machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
>> installed, and I'm right back to the hanging during the examples assembly
>> (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
>> improve the results of the ClearStory test suite I was looking at, so my
>> misery isn't over; but yours might be with a newer JDK....
>>
>>
>>
>> On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:
>>
>>> Must be a local environment thing, because AmpLab Jenkins can't
>>> reproduce it..... :-p
>>>
>>> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com> wrote:
>>> > Someone on the users list also encountered this exception:
>>> >
>>> >
>>> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>>> >
>>> >
>>> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
>>> >
>>> >> I'm at the latest
>>> >>
>>> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>>> >> Merge: aec9bf9 a197137
>>> >> Author: Reynold Xin <rx...@apache.org>
>>> >> Date:   Tue Oct 29 01:41:44 2013 -0400
>>> >>
>>> >>
>>> >> and seeing this when I do a "test-only FileServerSuite":
>>> >>
>>> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
>>> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
>>> >> java.io.StreamCorruptedException
>>> >> java.io.StreamCorruptedException: invalid type code: AC
>>> >>         at
>>> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>>> >>         at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>> >>         at
>>> >>
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>>> >>         at
>>> >>
>>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>>> >>         at
>>> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>>> >>         at
>>> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>>> >>         at
>>> >>
>>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>>> >>         at
>>> >>
>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>>> >>         at
>>> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>>> >>         at
>>> >>
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>>> >>         at
>>> >>
>>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>>> >>         at
>>> >>
>>> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>>> >>         at
>>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>>> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>>> >>         at
>>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>>> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>> >>         at
>>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>>> >>         at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>> >>         at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>> >>         at java.lang.Thread.run(Thread.java:680)
>>> >>
>>> >>
>>> >> Anybody else seen this yet?
>>> >>
>>> >> I have a really simple PR and this fails without my change, so I may
>>> >> go ahead and submit it anyways.
>>> >>
>>> >> --
>>> >> --
>>> >> Evan Chan
>>> >> Staff Engineer
>>> >> ev@ooyala.com  |
>>> >>
>>>
>>>
>>>
>>> --
>>> --
>>> Evan Chan
>>> Staff Engineer
>>> ev@ooyala.com  |
>>>
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Re: Getting failures in FileServerSuite

Posted by Evan Chan <ev...@ooyala.com>.

Mark:  I'm using JDK 1.6. --

On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
> What JDK version on you using, Evan?
>
> I tried to reproduce your problem earlier today, but I wasn't even able to
> get through the assembly build -- kept hanging when trying to build the
> examples assembly.  Foregoing the assembly and running the tests would hang
> on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> just hung.  And I was actually seeing a very similar stack trace to yours
> from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
> the same because line numbers were different once it went into the java
> runtime, and it eventually ended up someplace a little different.  That got
> me curious about differences in Java versions, so I updated to the latest
> Oracle release (1.7.0_45).  Now it cruises right through the build and test
> of Spark master from before Matei merged your PR.  Then I logged into a
> machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> installed, and I'm right back to the hanging during the examples assembly
> (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> improve the results of the ClearStory test suite I was looking at, so my
> misery isn't over; but yours might be with a newer JDK....
>
>
>
> On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:
>
>> Must be a local environment thing, because AmpLab Jenkins can't
>> reproduce it..... :-p
>>
>> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com> wrote:
>> > Someone on the users list also encountered this exception:
>> >
>> >
>> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>> >
>> >
>> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
>> >
>> >> I'm at the latest
>> >>
>> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>> >> Merge: aec9bf9 a197137
>> >> Author: Reynold Xin <rx...@apache.org>
>> >> Date:   Tue Oct 29 01:41:44 2013 -0400
>> >>
>> >>
>> >> and seeing this when I do a "test-only FileServerSuite":
>> >>
>> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
>> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
>> >> java.io.StreamCorruptedException
>> >> java.io.StreamCorruptedException: invalid type code: AC
>> >>         at
>> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>> >>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>> >>         at
>> >>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>> >>         at
>> >>
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>> >>         at
>> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> >>         at
>> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>> >>         at
>> >>
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>> >>         at
>> >>
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> >>         at
>> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>> >>         at
>> >>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>> >>         at
>> >>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>> >>         at
>> >>
>> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>> >>         at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> >>         at
>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>> >>         at
>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> >>         at java.lang.Thread.run(Thread.java:680)
>> >>
>> >>
>> >> Anybody else seen this yet?
>> >>
>> >> I have a really simple PR and this fails without my change, so I may
>> >> go ahead and submit it anyways.
>> >>
>> >> --
>> >> --
>> >> Evan Chan
>> >> Staff Engineer
>> >> ev@ooyala.com  |
>> >>
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> ev@ooyala.com  |
>>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

RE: Getting failures in FileServerSuite

Posted by "Shao, Saisai" <sa...@intel.com>.

Hi All,

I sent the mail about this streaming corrupted problem a few days ago.

I use published Spark 0.8.0-incubating, not the latest, and Java version is 1.6.0_30, so I think this is not a recently introduced problem. This problem blocks me recently, I was wondering if you guys have any clue.

Thanks
Jerry

-----Original Message-----
From: Mark Hamstra [mailto:mark@clearstorydata.com] 
Sent: Thursday, October 31, 2013 7:25 AM
To: dev@spark.incubator.apache.org
Subject: Re: Getting failures in FileServerSuite

Maybe I was bailing too early, Kay.  I'm sure I waited at least 15 mins, but maybe not 30.



On Wed, Oct 30, 2013 at 3:45 PM, Kay Ousterhout <ke...@eecs.berkeley.edu>wrote:

> Patrick: I don't think this was caused by a recent merge -- pretty 
> sure I was seeing it last week.
>
> Mark: Are you sure the examples assembly is hanging, as opposed to 
> just taking a long time?  It takes ~30 minutes on my machine (not 
> doubting that the Java version update fixes it -- just pointing out 
> that if you wait, it may actually finish).
>
> Evan: One thing to note is that the log message is wrong (see
> https://github.com/apache/incubator-spark/pull/126): the task is 
> actually failing just once, not 4 times.  Doesn't help fix the issue 
> -- but just thought I'd point it out in case anyone else is trying to look into this.
>
>
> On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
> > This may have been caused by a recent merge since a bunch of people 
> > independently hit it in the last 48 hours.
> >
> > One debugging step would be to narrow it down to which merge caused 
> > it. I don't have time personally today, but just a suggestion for 
> > ppl for whom this is blocking progress.
> >
> > - Patrick
> >
> > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra 
> > <ma...@clearstorydata.com>
> > wrote:
> > > What JDK version on you using, Evan?
> > >
> > > I tried to reproduce your problem earlier today, but I wasn't even 
> > > able
> > to
> > > get through the assembly build -- kept hanging when trying to 
> > > build the examples assembly.  Foregoing the assembly and running 
> > > the tests would
> > hang
> > > on FileServerSuite "Dynamically adding JARS locally" -- no stack 
> > > trace, just hung.  And I was actually seeing a very similar stack 
> > > trace to
> yours
> > > from a test suite of our own running against 0.8.1-SNAPSHOT -- not
> > exactly
> > > the same because line numbers were different once it went into the 
> > > java runtime, and it eventually ended up someplace a little 
> > > different.  That
> > got
> > > me curious about differences in Java versions, so I updated to the
> latest
> > > Oracle release (1.7.0_45).  Now it cruises right through the build 
> > > and
> > test
> > > of Spark master from before Matei merged your PR.  Then I logged 
> > > into a machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, 
> > > actually) installed, and I'm right back to the hanging during the 
> > > examples
> assembly
> > > (but passes FileServerSuite, oddly enough.)  Upgrading the JDK 
> > > didn't improve the results of the ClearStory test suite I was 
> > > looking at, so
> my
> > > misery isn't over; but yours might be with a newer JDK....
> > >
> > >
> > >
> > > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:
> > >
> > >> Must be a local environment thing, because AmpLab Jenkins can't 
> > >> reproduce it..... :-p
> > >>
> > >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen 
> > >> <ro...@gmail.com>
> > wrote:
> > >> > Someone on the users list also encountered this exception:
> > >> >
> > >> >
> > >>
> >
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.
> mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.
> intel.com%3E
> > >> >
> > >> >
> > >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
> > >> >
> > >> >> I'm at the latest
> > >> >>
> > >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> > >> >> Merge: aec9bf9 a197137
> > >> >> Author: Reynold Xin <rx...@apache.org>
> > >> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> > >> >>
> > >> >>
> > >> >> and seeing this when I do a "test-only FileServerSuite":
> > >> >>
> > >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed 
> > >> >> ResultTask(0, 0)
> > >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due 
> > >> >> to java.io.StreamCorruptedException
> > >> >> java.io.StreamCorruptedException: invalid type code: AC
> > >> >>         at
> > >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> > >> >>         at
> > >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaS
> erializer.scala:39)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Seri
> alizer.scala:101)
> > >> >>         at
> > >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> > >> >>         at
> > >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.sc
> ala:26)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.s
> cala:27)
> > >> >>         at
> > >> >>
> > org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:5
> > 3)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(Pa
> irRDDFunctions.scala:95)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(Pa
> irRDDFunctions.scala:94)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitions
> WithContextRDD.scala:40)
> > >> >>         at
> > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> > >> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> > >> >>         at
> > >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> > >> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
> > >> >>         at
> > >> >>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
> > >> >>         at
> > >> >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
> tor.java:895)
> > >> >>         at
> > >> >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
> java:918)
> > >> >>         at java.lang.Thread.run(Thread.java:680)
> > >> >>
> > >> >>
> > >> >> Anybody else seen this yet?
> > >> >>
> > >> >> I have a really simple PR and this fails without my change, so 
> > >> >> I
> may
> > >> >> go ahead and submit it anyways.
> > >> >>
> > >> >> --
> > >> >> --
> > >> >> Evan Chan
> > >> >> Staff Engineer
> > >> >> ev@ooyala.com  |
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Evan Chan
> > >> Staff Engineer
> > >> ev@ooyala.com  |
> > >>
> >
>

Re: Getting failures in FileServerSuite

Posted by Mark Hamstra <ma...@clearstorydata.com>.

Maybe I was bailing too early, Kay.  I'm sure I waited at least 15 mins,
but maybe not 30.



On Wed, Oct 30, 2013 at 3:45 PM, Kay Ousterhout <ke...@eecs.berkeley.edu>wrote:

> Patrick: I don't think this was caused by a recent merge -- pretty sure I
> was seeing it last week.
>
> Mark: Are you sure the examples assembly is hanging, as opposed to just
> taking a long time?  It takes ~30 minutes on my machine (not doubting that
> the Java version update fixes it -- just pointing out that if you wait, it
> may actually finish).
>
> Evan: One thing to note is that the log message is wrong (see
> https://github.com/apache/incubator-spark/pull/126): the task is actually
> failing just once, not 4 times.  Doesn't help fix the issue -- but just
> thought I'd point it out in case anyone else is trying to look into this.
>
>
> On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell <pw...@gmail.com>
> wrote:
>
> > This may have been caused by a recent merge since a bunch of people
> > independently hit it in the last 48 hours.
> >
> > One debugging step would be to narrow it down to which merge caused
> > it. I don't have time personally today, but just a suggestion for ppl
> > for whom this is blocking progress.
> >
> > - Patrick
> >
> > On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra <ma...@clearstorydata.com>
> > wrote:
> > > What JDK version on you using, Evan?
> > >
> > > I tried to reproduce your problem earlier today, but I wasn't even able
> > to
> > > get through the assembly build -- kept hanging when trying to build the
> > > examples assembly.  Foregoing the assembly and running the tests would
> > hang
> > > on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> > > just hung.  And I was actually seeing a very similar stack trace to
> yours
> > > from a test suite of our own running against 0.8.1-SNAPSHOT -- not
> > exactly
> > > the same because line numbers were different once it went into the java
> > > runtime, and it eventually ended up someplace a little different.  That
> > got
> > > me curious about differences in Java versions, so I updated to the
> latest
> > > Oracle release (1.7.0_45).  Now it cruises right through the build and
> > test
> > > of Spark master from before Matei merged your PR.  Then I logged into a
> > > machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> > > installed, and I'm right back to the hanging during the examples
> assembly
> > > (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> > > improve the results of the ClearStory test suite I was looking at, so
> my
> > > misery isn't over; but yours might be with a newer JDK....
> > >
> > >
> > >
> > > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:
> > >
> > >> Must be a local environment thing, because AmpLab Jenkins can't
> > >> reproduce it..... :-p
> > >>
> > >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com>
> > wrote:
> > >> > Someone on the users list also encountered this exception:
> > >> >
> > >> >
> > >>
> >
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
> > >> >
> > >> >
> > >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
> > >> >
> > >> >> I'm at the latest
> > >> >>
> > >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> > >> >> Merge: aec9bf9 a197137
> > >> >> Author: Reynold Xin <rx...@apache.org>
> > >> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> > >> >>
> > >> >>
> > >> >> and seeing this when I do a "test-only FileServerSuite":
> > >> >>
> > >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> > >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> > >> >> java.io.StreamCorruptedException
> > >> >> java.io.StreamCorruptedException: invalid type code: AC
> > >> >>         at
> > >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> > >> >>         at
> > >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> > >> >>         at
> > >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> > >> >>         at
> > >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> > >> >>         at
> > >> >>
> > org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
> > >> >>         at
> > >> >>
> > >>
> >
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
> > >> >>         at
> > >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> > >> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> > >> >>         at
> > >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> > >> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
> > >> >>         at
> > >> >>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
> > >> >>         at
> > >> >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> > >> >>         at
> > >> >>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> > >> >>         at java.lang.Thread.run(Thread.java:680)
> > >> >>
> > >> >>
> > >> >> Anybody else seen this yet?
> > >> >>
> > >> >> I have a really simple PR and this fails without my change, so I
> may
> > >> >> go ahead and submit it anyways.
> > >> >>
> > >> >> --
> > >> >> --
> > >> >> Evan Chan
> > >> >> Staff Engineer
> > >> >> ev@ooyala.com  |
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> --
> > >> Evan Chan
> > >> Staff Engineer
> > >> ev@ooyala.com  |
> > >>
> >
>

Re: Getting failures in FileServerSuite

Posted by Kay Ousterhout <ke...@eecs.berkeley.edu>.

Patrick: I don't think this was caused by a recent merge -- pretty sure I
was seeing it last week.

Mark: Are you sure the examples assembly is hanging, as opposed to just
taking a long time?  It takes ~30 minutes on my machine (not doubting that
the Java version update fixes it -- just pointing out that if you wait, it
may actually finish).

Evan: One thing to note is that the log message is wrong (see
https://github.com/apache/incubator-spark/pull/126): the task is actually
failing just once, not 4 times.  Doesn't help fix the issue -- but just
thought I'd point it out in case anyone else is trying to look into this.


On Wed, Oct 30, 2013 at 2:08 PM, Patrick Wendell <pw...@gmail.com> wrote:

> This may have been caused by a recent merge since a bunch of people
> independently hit it in the last 48 hours.
>
> One debugging step would be to narrow it down to which merge caused
> it. I don't have time personally today, but just a suggestion for ppl
> for whom this is blocking progress.
>
> - Patrick
>
> On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra <ma...@clearstorydata.com>
> wrote:
> > What JDK version on you using, Evan?
> >
> > I tried to reproduce your problem earlier today, but I wasn't even able
> to
> > get through the assembly build -- kept hanging when trying to build the
> > examples assembly.  Foregoing the assembly and running the tests would
> hang
> > on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> > just hung.  And I was actually seeing a very similar stack trace to yours
> > from a test suite of our own running against 0.8.1-SNAPSHOT -- not
> exactly
> > the same because line numbers were different once it went into the java
> > runtime, and it eventually ended up someplace a little different.  That
> got
> > me curious about differences in Java versions, so I updated to the latest
> > Oracle release (1.7.0_45).  Now it cruises right through the build and
> test
> > of Spark master from before Matei merged your PR.  Then I logged into a
> > machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> > installed, and I'm right back to the hanging during the examples assembly
> > (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> > improve the results of the ClearStory test suite I was looking at, so my
> > misery isn't over; but yours might be with a newer JDK....
> >
> >
> >
> > On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:
> >
> >> Must be a local environment thing, because AmpLab Jenkins can't
> >> reproduce it..... :-p
> >>
> >> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com>
> wrote:
> >> > Someone on the users list also encountered this exception:
> >> >
> >> >
> >>
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
> >> >
> >> >
> >> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
> >> >
> >> >> I'm at the latest
> >> >>
> >> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> >> >> Merge: aec9bf9 a197137
> >> >> Author: Reynold Xin <rx...@apache.org>
> >> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> >> >>
> >> >>
> >> >> and seeing this when I do a "test-only FileServerSuite":
> >> >>
> >> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> >> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> >> >> java.io.StreamCorruptedException
> >> >> java.io.StreamCorruptedException: invalid type code: AC
> >> >>         at
> >> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> >> >>         at
> >> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> >> >>         at
> >> >>
> >>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> >> >>         at
> >> >>
> >>
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> >> >>         at
> >> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> >> >>         at
> >> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> >> >>         at
> >> >>
> >>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> >> >>         at
> >> >>
> >>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> >> >>         at
> >> >>
> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> >> >>         at
> >> >>
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> >> >>         at
> >> >>
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
> >> >>         at
> >> >>
> >>
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
> >> >>         at
> >> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> >> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> >> >>         at
> >> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> >> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
> >> >>         at
> >> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
> >> >>         at
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >> >>         at
> >> >>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >> >>         at java.lang.Thread.run(Thread.java:680)
> >> >>
> >> >>
> >> >> Anybody else seen this yet?
> >> >>
> >> >> I have a really simple PR and this fails without my change, so I may
> >> >> go ahead and submit it anyways.
> >> >>
> >> >> --
> >> >> --
> >> >> Evan Chan
> >> >> Staff Engineer
> >> >> ev@ooyala.com  |
> >> >>
> >>
> >>
> >>
> >> --
> >> --
> >> Evan Chan
> >> Staff Engineer
> >> ev@ooyala.com  |
> >>
>

Re: Getting failures in FileServerSuite

Posted by Patrick Wendell <pw...@gmail.com>.

This may have been caused by a recent merge since a bunch of people
independently hit it in the last 48 hours.

One debugging step would be to narrow it down to which merge caused
it. I don't have time personally today, but just a suggestion for ppl
for whom this is blocking progress.

- Patrick

On Wed, Oct 30, 2013 at 1:44 PM, Mark Hamstra <ma...@clearstorydata.com> wrote:
> What JDK version on you using, Evan?
>
> I tried to reproduce your problem earlier today, but I wasn't even able to
> get through the assembly build -- kept hanging when trying to build the
> examples assembly.  Foregoing the assembly and running the tests would hang
> on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
> just hung.  And I was actually seeing a very similar stack trace to yours
> from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
> the same because line numbers were different once it went into the java
> runtime, and it eventually ended up someplace a little different.  That got
> me curious about differences in Java versions, so I updated to the latest
> Oracle release (1.7.0_45).  Now it cruises right through the build and test
> of Spark master from before Matei merged your PR.  Then I logged into a
> machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
> installed, and I'm right back to the hanging during the examples assembly
> (but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
> improve the results of the ClearStory test suite I was looking at, so my
> misery isn't over; but yours might be with a newer JDK....
>
>
>
> On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:
>
>> Must be a local environment thing, because AmpLab Jenkins can't
>> reproduce it..... :-p
>>
>> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com> wrote:
>> > Someone on the users list also encountered this exception:
>> >
>> >
>> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>> >
>> >
>> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
>> >
>> >> I'm at the latest
>> >>
>> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>> >> Merge: aec9bf9 a197137
>> >> Author: Reynold Xin <rx...@apache.org>
>> >> Date:   Tue Oct 29 01:41:44 2013 -0400
>> >>
>> >>
>> >> and seeing this when I do a "test-only FileServerSuite":
>> >>
>> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
>> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
>> >> java.io.StreamCorruptedException
>> >> java.io.StreamCorruptedException: invalid type code: AC
>> >>         at
>> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>> >>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>> >>         at
>> >>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>> >>         at
>> >>
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>> >>         at
>> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> >>         at
>> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>> >>         at
>> >>
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>> >>         at
>> >>
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>> >>         at
>> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>> >>         at
>> >>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>> >>         at
>> >>
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>> >>         at
>> >>
>> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>> >>         at
>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>> >>         at
>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>> >>         at
>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> >>         at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> >>         at java.lang.Thread.run(Thread.java:680)
>> >>
>> >>
>> >> Anybody else seen this yet?
>> >>
>> >> I have a really simple PR and this fails without my change, so I may
>> >> go ahead and submit it anyways.
>> >>
>> >> --
>> >> --
>> >> Evan Chan
>> >> Staff Engineer
>> >> ev@ooyala.com  |
>> >>
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> ev@ooyala.com  |
>>

Re: Getting failures in FileServerSuite

Posted by Mark Hamstra <ma...@clearstorydata.com>.

What JDK version on you using, Evan?

I tried to reproduce your problem earlier today, but I wasn't even able to
get through the assembly build -- kept hanging when trying to build the
examples assembly.  Foregoing the assembly and running the tests would hang
on FileServerSuite "Dynamically adding JARS locally" -- no stack trace,
just hung.  And I was actually seeing a very similar stack trace to yours
from a test suite of our own running against 0.8.1-SNAPSHOT -- not exactly
the same because line numbers were different once it went into the java
runtime, and it eventually ended up someplace a little different.  That got
me curious about differences in Java versions, so I updated to the latest
Oracle release (1.7.0_45).  Now it cruises right through the build and test
of Spark master from before Matei merged your PR.  Then I logged into a
machine that has 1.7.0_15 (7u15-2.3.7-0ubuntu1~11.10.1, actually)
installed, and I'm right back to the hanging during the examples assembly
(but passes FileServerSuite, oddly enough.)  Upgrading the JDK didn't
improve the results of the ClearStory test suite I was looking at, so my
misery isn't over; but yours might be with a newer JDK....



On Wed, Oct 30, 2013 at 12:44 PM, Evan Chan <ev...@ooyala.com> wrote:

> Must be a local environment thing, because AmpLab Jenkins can't
> reproduce it..... :-p
>
> On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com> wrote:
> > Someone on the users list also encountered this exception:
> >
> >
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
> >
> >
> > On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
> >
> >> I'm at the latest
> >>
> >> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> >> Merge: aec9bf9 a197137
> >> Author: Reynold Xin <rx...@apache.org>
> >> Date:   Tue Oct 29 01:41:44 2013 -0400
> >>
> >>
> >> and seeing this when I do a "test-only FileServerSuite":
> >>
> >> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> >> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> >> java.io.StreamCorruptedException
> >> java.io.StreamCorruptedException: invalid type code: AC
> >>         at
> >> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> >>         at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> >>         at
> >>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> >>         at
> >>
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
> >>         at
> >> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> >>         at
> scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
> >>         at
> >>
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
> >>         at
> >>
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
> >>         at
> >> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
> >>         at
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
> >>         at
> >>
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
> >>         at
> >>
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
> >>         at
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
> >>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
> >>         at
> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
> >>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
> >>         at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >>         at java.lang.Thread.run(Thread.java:680)
> >>
> >>
> >> Anybody else seen this yet?
> >>
> >> I have a really simple PR and this fails without my change, so I may
> >> go ahead and submit it anyways.
> >>
> >> --
> >> --
> >> Evan Chan
> >> Staff Engineer
> >> ev@ooyala.com  |
> >>
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |
>

Re: Getting failures in FileServerSuite

Posted by Evan Chan <ev...@ooyala.com>.

Must be a local environment thing, because AmpLab Jenkins can't
reproduce it..... :-p

On Wed, Oct 30, 2013 at 11:10 AM, Josh Rosen <ro...@gmail.com> wrote:
> Someone on the users list also encountered this exception:
>
> https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E
>
>
> On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:
>
>> I'm at the latest
>>
>> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
>> Merge: aec9bf9 a197137
>> Author: Reynold Xin <rx...@apache.org>
>> Date:   Tue Oct 29 01:41:44 2013 -0400
>>
>>
>> and seeing this when I do a "test-only FileServerSuite":
>>
>> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
>> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
>> java.io.StreamCorruptedException
>> java.io.StreamCorruptedException: invalid type code: AC
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>         at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>>         at
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>>         at
>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>>         at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>>         at
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>>         at
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>>         at
>> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>>         at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>>         at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>>         at
>> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>>         at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>         at java.lang.Thread.run(Thread.java:680)
>>
>>
>> Anybody else seen this yet?
>>
>> I have a really simple PR and this fails without my change, so I may
>> go ahead and submit it anyways.
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> ev@ooyala.com  |
>>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Re: Getting failures in FileServerSuite

Posted by Josh Rosen <ro...@gmail.com>.

Someone on the users list also encountered this exception:

https://mail-archives.apache.org/mod_mbox/incubator-spark-user/201310.mbox/%3C64474308D680D540A4D8151B0F7C03F7025EF289%40SHSMSX104.ccr.corp.intel.com%3E


On Wed, Oct 30, 2013 at 9:40 AM, Evan Chan <ev...@ooyala.com> wrote:

> I'm at the latest
>
> commit f0e23a023ce1356bc0f04248605c48d4d08c2d05
> Merge: aec9bf9 a197137
> Author: Reynold Xin <rx...@apache.org>
> Date:   Tue Oct 29 01:41:44 2013 -0400
>
>
> and seeing this when I do a "test-only FileServerSuite":
>
> 13/10/30 09:35:04.300 INFO DAGScheduler: Completed ResultTask(0, 0)
> 13/10/30 09:35:04.307 INFO LocalTaskSetManager: Loss was due to
> java.io.StreamCorruptedException
> java.io.StreamCorruptedException: invalid type code: AC
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>         at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>         at
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:101)
>         at
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>         at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
>         at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:26)
>         at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:27)
>         at
> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:53)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:95)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:94)
>         at
> org.apache.spark.rdd.MapPartitionsWithContextRDD.compute(MapPartitionsWithContextRDD.scala:40)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:107)
>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:212)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:680)
>
>
> Anybody else seen this yet?
>
> I have a really simple PR and this fails without my change, so I may
> go ahead and submit it anyways.
>
> --
> --
> Evan Chan
> Staff Engineer
> ev@ooyala.com  |
>