You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Jason Plurad <pl...@gmail.com> on 2016/01/28 21:48:25 UTC

spark standalone cluster 1.5.2

We're running into this error with standalone Spark clusters
<http://spark.apache.org/docs/1.5.2/spark-standalone.html>.

```
WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in stage
0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
serialVersionUID = -3343649307726848892, local class serialVersionUID =
-3996494161745401652
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
    at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
    at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
    at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
    at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
    at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
    at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
    at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
```

You can reproduce this error 2 ways:
* Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a Spark
1.5.2 standalone cluster
* Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a Spark
1.5.1 standalone cluster

Only standalone Spark cluster gets broken -- the Spark cluster version must
be matched exactly with what TinkerPop is built against.

This commit
<https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c>
bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned, it
does pass the unit tests, but the unit tests are run with
`spark.master=local`. I've tested that it also works with
`spark.master=yarn-client`.

What is -- or rather, what should be -- the direction/policy for dependency
version upgrades in TinkerPop?

-- Jason

Re: spark standalone cluster 1.5.2

Posted by Jason Plurad <pl...@gmail.com>.
I agree we should get serialization addressed in TinkerPop. I'm still a bit
surprised that Spark had this problem, considering its popularity, so it
proves there are idiots everywhere ;)
On Fri, Jan 29, 2016 at 12:10 PM Marko Rodriguez <ok...@gmail.com>
wrote:

> Hi Jason,
>
> > In the meantime, it sounds like you have to match the compiled Spark
> > version with the runtime. I saw a bunch of posts and a couple JIRA where
> > they always came back to that as the solution.
>
> So whats the deal for us? I say we release with Spark 1.5.2 as its a minor
> bump and if there is a "jar swap" trick that works for people, thats that.
>
> > Wonder how exposed TinkerPop is with Serializable and serialVersionUIDs.
>
> Dan LaRocque was basically saying we are idiots for not using
> serialVersionIDs. I didn't even know what that was all about until he told
> me. I think we DEFINITELY need to get that solid for 3.2.0.
>
> Thoughts?,
> Marko.
>
>
> > On Thu, Jan 28, 2016 at 4:10 PM, Jason Plurad <pl...@gmail.com> wrote:
> >
> >> Yeah, I was surprised about the incompatibility. It seems contained to
> the
> >> standalone Spark server deployment only.
> >>
> >> You can reproduce the same stack trace with their Spark Pi example on
> >> standalone Spark servers (try to run Pi from 1.5.2 on a 1.5.1
> standalone,
> >> or Pi 1.5.1 on a 1.5.2 standalone).
> >>
> >> yarn-client and local tested out fine.
> >>
> >> I'll post out on the Spark list and see what they come back with.
> >>
> >>
> >> On Thu, Jan 28, 2016 at 3:51 PM, Marko Rodriguez <ok...@gmail.com>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark
> >>> 1.5.2 2-billion edge benchmarking (against SparkServer) and all is
> good.
> >>>
> >>> Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible?
> Thats a
> >>> bummer.
> >>>
> >>> I don't think there is an "official policy," but I always bump minor
> >>> release versions with minor release versions. That is, I didn't bump to
> >>> Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is
> minor
> >>> to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop,
> various
> >>> Java libraries…
> >>>
> >>> Thoughts?,
> >>> Marko.
> >>>
> >>> http://markorodriguez.com
> >>>
> >>> On Jan 28, 2016, at 1:48 PM, Jason Plurad <pl...@gmail.com> wrote:
> >>>
> >>>> We're running into this error with standalone Spark clusters
> >>>> <http://spark.apache.org/docs/1.5.2/spark-standalone.html>.
> >>>>
> >>>> ```
> >>>> WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in
> >>> stage
> >>>> 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
> >>>> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
> >>>> serialVersionUID = -3343649307726848892, local class serialVersionUID
> =
> >>>> -3996494161745401652
> >>>>   at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
> >>>>   at
> >>>>
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
> >>>>   at
> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> >>>>   at
> >>>>
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
> >>>>   at
> >>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> >>>>   at
> >>>>
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> >>>>   at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> >>>>   at
> >>>>
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> >>>>   at
> >>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> >>>>   at
> >>>>
> >>>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> >>>>   at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> >>>>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
> >>>>   at
> >>>>
> >>>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >>>>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >>>>   at
> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >>>>   at
> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >>>>   at
> >>>>
> >>>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >>>>   at java.lang.Thread.run(Thread.java:745)
> >>>> ```
> >>>>
> >>>> You can reproduce this error 2 ways:
> >>>> * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a
> >>> Spark
> >>>> 1.5.2 standalone cluster
> >>>> * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a
> Spark
> >>>> 1.5.1 standalone cluster
> >>>>
> >>>> Only standalone Spark cluster gets broken -- the Spark cluster version
> >>> must
> >>>> be matched exactly with what TinkerPop is built against.
> >>>>
> >>>> This commit
> >>>> <
> >>>
> https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c
> >>>>
> >>>> bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned,
> it
> >>>> does pass the unit tests, but the unit tests are run with
> >>>> `spark.master=local`. I've tested that it also works with
> >>>> `spark.master=yarn-client`.
> >>>>
> >>>> What is -- or rather, what should be -- the direction/policy for
> >>> dependency
> >>>> version upgrades in TinkerPop?
> >>>>
> >>>> -- Jason
> >>>
> >>>
> >>
>
>

Re: spark standalone cluster 1.5.2

Posted by Marko Rodriguez <ok...@gmail.com>.
Hi Jason,

> In the meantime, it sounds like you have to match the compiled Spark
> version with the runtime. I saw a bunch of posts and a couple JIRA where
> they always came back to that as the solution.

So whats the deal for us? I say we release with Spark 1.5.2 as its a minor bump and if there is a "jar swap" trick that works for people, thats that.

> Wonder how exposed TinkerPop is with Serializable and serialVersionUIDs.

Dan LaRocque was basically saying we are idiots for not using serialVersionIDs. I didn't even know what that was all about until he told me. I think we DEFINITELY need to get that solid for 3.2.0.

Thoughts?,
Marko.


> On Thu, Jan 28, 2016 at 4:10 PM, Jason Plurad <pl...@gmail.com> wrote:
> 
>> Yeah, I was surprised about the incompatibility. It seems contained to the
>> standalone Spark server deployment only.
>> 
>> You can reproduce the same stack trace with their Spark Pi example on
>> standalone Spark servers (try to run Pi from 1.5.2 on a 1.5.1 standalone,
>> or Pi 1.5.1 on a 1.5.2 standalone).
>> 
>> yarn-client and local tested out fine.
>> 
>> I'll post out on the Spark list and see what they come back with.
>> 
>> 
>> On Thu, Jan 28, 2016 at 3:51 PM, Marko Rodriguez <ok...@gmail.com>
>> wrote:
>> 
>>> Hello,
>>> 
>>> This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark
>>> 1.5.2 2-billion edge benchmarking (against SparkServer) and all is good.
>>> 
>>> Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible? Thats a
>>> bummer.
>>> 
>>> I don't think there is an "official policy," but I always bump minor
>>> release versions with minor release versions. That is, I didn't bump to
>>> Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is minor
>>> to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop, various
>>> Java libraries…
>>> 
>>> Thoughts?,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> On Jan 28, 2016, at 1:48 PM, Jason Plurad <pl...@gmail.com> wrote:
>>> 
>>>> We're running into this error with standalone Spark clusters
>>>> <http://spark.apache.org/docs/1.5.2/spark-standalone.html>.
>>>> 
>>>> ```
>>>> WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in
>>> stage
>>>> 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
>>>> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
>>>> serialVersionUID = -3343649307726848892, local class serialVersionUID =
>>>> -3996494161745401652
>>>>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
>>>>   at
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>>>>   at
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>>>>   at
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>>>>   at
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>>>>   at
>>>> 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>>>>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>   at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>>>>   at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>   at
>>>> 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>>>>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>>>>   at
>>>> 
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>>>   at
>>>> 
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>>>   at
>>>> 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
>>>>   at
>>>> 
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>>>   at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>>>   at
>>>> 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>   at
>>>> 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>   at java.lang.Thread.run(Thread.java:745)
>>>> ```
>>>> 
>>>> You can reproduce this error 2 ways:
>>>> * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a
>>> Spark
>>>> 1.5.2 standalone cluster
>>>> * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a Spark
>>>> 1.5.1 standalone cluster
>>>> 
>>>> Only standalone Spark cluster gets broken -- the Spark cluster version
>>> must
>>>> be matched exactly with what TinkerPop is built against.
>>>> 
>>>> This commit
>>>> <
>>> https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c
>>>> 
>>>> bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned, it
>>>> does pass the unit tests, but the unit tests are run with
>>>> `spark.master=local`. I've tested that it also works with
>>>> `spark.master=yarn-client`.
>>>> 
>>>> What is -- or rather, what should be -- the direction/policy for
>>> dependency
>>>> version upgrades in TinkerPop?
>>>> 
>>>> -- Jason
>>> 
>>> 
>> 


Re: spark standalone cluster 1.5.2

Posted by Jason Plurad <pl...@gmail.com>.
They came back with https://issues.apache.org/jira/browse/SPARK-13084

RDD
<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L74>
is declared as Serializable, but it doesn't define a serialVersionUID.

In the meantime, it sounds like you have to match the compiled Spark
version with the runtime. I saw a bunch of posts and a couple JIRA where
they always came back to that as the solution.

Wonder how exposed TinkerPop is with Serializable and serialVersionUIDs.


On Thu, Jan 28, 2016 at 4:10 PM, Jason Plurad <pl...@gmail.com> wrote:

> Yeah, I was surprised about the incompatibility. It seems contained to the
> standalone Spark server deployment only.
>
> You can reproduce the same stack trace with their Spark Pi example on
> standalone Spark servers (try to run Pi from 1.5.2 on a 1.5.1 standalone,
> or Pi 1.5.1 on a 1.5.2 standalone).
>
> yarn-client and local tested out fine.
>
> I'll post out on the Spark list and see what they come back with.
>
>
> On Thu, Jan 28, 2016 at 3:51 PM, Marko Rodriguez <ok...@gmail.com>
> wrote:
>
>> Hello,
>>
>> This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark
>> 1.5.2 2-billion edge benchmarking (against SparkServer) and all is good.
>>
>> Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible? Thats a
>> bummer.
>>
>> I don't think there is an "official policy," but I always bump minor
>> release versions with minor release versions. That is, I didn't bump to
>> Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is minor
>> to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop, various
>> Java libraries…
>>
>> Thoughts?,
>> Marko.
>>
>> http://markorodriguez.com
>>
>> On Jan 28, 2016, at 1:48 PM, Jason Plurad <pl...@gmail.com> wrote:
>>
>> > We're running into this error with standalone Spark clusters
>> > <http://spark.apache.org/docs/1.5.2/spark-standalone.html>.
>> >
>> > ```
>> > WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in
>> stage
>> > 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
>> > org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
>> > serialVersionUID = -3343649307726848892, local class serialVersionUID =
>> > -3996494161745401652
>> >    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
>> >    at
>> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>> >    at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>> >    at
>> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>> >    at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>> >    at
>> >
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>> >    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> >    at
>> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>> >    at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>> >    at
>> >
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> >    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>> >    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>> >    at
>> >
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>> >    at
>> >
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>> >    at
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
>> >    at
>> >
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> >    at org.apache.spark.scheduler.Task.run(Task.scala:88)
>> >    at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>> >    at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >    at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >    at java.lang.Thread.run(Thread.java:745)
>> > ```
>> >
>> > You can reproduce this error 2 ways:
>> > * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a
>> Spark
>> > 1.5.2 standalone cluster
>> > * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a Spark
>> > 1.5.1 standalone cluster
>> >
>> > Only standalone Spark cluster gets broken -- the Spark cluster version
>> must
>> > be matched exactly with what TinkerPop is built against.
>> >
>> > This commit
>> > <
>> https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c
>> >
>> > bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned, it
>> > does pass the unit tests, but the unit tests are run with
>> > `spark.master=local`. I've tested that it also works with
>> > `spark.master=yarn-client`.
>> >
>> > What is -- or rather, what should be -- the direction/policy for
>> dependency
>> > version upgrades in TinkerPop?
>> >
>> > -- Jason
>>
>>
>

Re: spark standalone cluster 1.5.2

Posted by Jason Plurad <pl...@gmail.com>.
Yeah, I was surprised about the incompatibility. It seems contained to the
standalone Spark server deployment only.

You can reproduce the same stack trace with their Spark Pi example on
standalone Spark servers (try to run Pi from 1.5.2 on a 1.5.1 standalone,
or Pi 1.5.1 on a 1.5.2 standalone).

yarn-client and local tested out fine.

I'll post out on the Spark list and see what they come back with.


On Thu, Jan 28, 2016 at 3:51 PM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello,
>
> This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark 1.5.2
> 2-billion edge benchmarking (against SparkServer) and all is good.
>
> Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible? Thats a
> bummer.
>
> I don't think there is an "official policy," but I always bump minor
> release versions with minor release versions. That is, I didn't bump to
> Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is minor
> to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop, various
> Java libraries…
>
> Thoughts?,
> Marko.
>
> http://markorodriguez.com
>
> On Jan 28, 2016, at 1:48 PM, Jason Plurad <pl...@gmail.com> wrote:
>
> > We're running into this error with standalone Spark clusters
> > <http://spark.apache.org/docs/1.5.2/spark-standalone.html>.
> >
> > ```
> > WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in stage
> > 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
> > org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
> > serialVersionUID = -3343649307726848892, local class serialVersionUID =
> > -3996494161745401652
> >    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
> >    at
> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
> >    at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> >    at
> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
> >    at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
> >    at
> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
> >    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> >    at
> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> >    at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> >    at
> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> >    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> >    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> >    at
> >
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
> >    at
> >
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
> >    at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
> >    at
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >    at org.apache.spark.scheduler.Task.run(Task.scala:88)
> >    at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >    at java.lang.Thread.run(Thread.java:745)
> > ```
> >
> > You can reproduce this error 2 ways:
> > * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a
> Spark
> > 1.5.2 standalone cluster
> > * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a Spark
> > 1.5.1 standalone cluster
> >
> > Only standalone Spark cluster gets broken -- the Spark cluster version
> must
> > be matched exactly with what TinkerPop is built against.
> >
> > This commit
> > <
> https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c
> >
> > bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned, it
> > does pass the unit tests, but the unit tests are run with
> > `spark.master=local`. I've tested that it also works with
> > `spark.master=yarn-client`.
> >
> > What is -- or rather, what should be -- the direction/policy for
> dependency
> > version upgrades in TinkerPop?
> >
> > -- Jason
>
>

Re: spark standalone cluster 1.5.2

Posted by Marko Rodriguez <ok...@gmail.com>.
Hello,

This is odd. We are currently doing TinkerPop 3.1.1-SNAPSHOT + Spark 1.5.2 2-billion edge benchmarking (against SparkServer) and all is good.

Are you saying that Spark 1.5.1 and Spark 1.5.2 are incompatible? Thats a bummer.

I don't think there is an "official policy," but I always bump minor release versions with minor release versions. That is, I didn't bump to Spark 1.6.0 (we will do that for TinkerPop 3.2.0), but since 1.5.1 is minor to 1.5.2, I bumped. We have always done that -- e.g. Neo4j, Hadoop, various Java libraries…

Thoughts?,
Marko.

http://markorodriguez.com

On Jan 28, 2016, at 1:48 PM, Jason Plurad <pl...@gmail.com> wrote:

> We're running into this error with standalone Spark clusters
> <http://spark.apache.org/docs/1.5.2/spark-standalone.html>.
> 
> ```
> WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in stage
> 0.0 (TID 0, 192.168.14.103): java.io.InvalidClassException:
> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
> serialVersionUID = -3343649307726848892, local class serialVersionUID =
> -3996494161745401652
>    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:621)
>    at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>    at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
>    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
>    at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
>    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>    at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>    at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>    at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>    at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>    at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
>    at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>    at org.apache.spark.scheduler.Task.run(Task.scala:88)
>    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>    at java.lang.Thread.run(Thread.java:745)
> ```
> 
> You can reproduce this error 2 ways:
> * Run a SparkGraphComputer from TinkerPop 3.1.0-incubating against a Spark
> 1.5.2 standalone cluster
> * Run a SparkGraphComputer from TinkerPop 3.1.1-SNAPSHOT against a Spark
> 1.5.1 standalone cluster
> 
> Only standalone Spark cluster gets broken -- the Spark cluster version must
> be matched exactly with what TinkerPop is built against.
> 
> This commit
> <https://github.com/apache/incubator-tinkerpop/commit/78b10569755070b088c460341bb473112dfe3ffe#diff-402e09222db9327564f28924e1b39d0c>
> bumped up the Spark version from 1.5.1 to 1.5.2. As Marko mentioned, it
> does pass the unit tests, but the unit tests are run with
> `spark.master=local`. I've tested that it also works with
> `spark.master=yarn-client`.
> 
> What is -- or rather, what should be -- the direction/policy for dependency
> version upgrades in TinkerPop?
> 
> -- Jason