You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by David Salinas <da...@gmail.com> on 2015/08/21 11:25:19 UTC

Closure issue with spark 1.4.1

Hi,

I have a problem when using spark closure. This error was not appearing
with spark 1.2.1.

I have included a reproducible example that happens when taking the closure
(Zeppelin has been built with head of master with this command mvn install
-DskipTests -Pspark-1.4 -Dspark.version=1.4.1 -Dhadoop.version=2.2.0
-Dprotobuf.version=2.5.0). Does anyone ever encountered this problem? All
my previous notebooks are broken by this :(

------------------------------
val textFile = sc.textFile("hdfs://somefile.txt")

val f = (s: String) => s+s
textFile.map(f).count
//works fine
//res145: Long = 407


def f(s:String) = {
    s+s
}
textFile.map(f).count

//fails ->

org.apache.spark.SparkException: Job aborted due to stage failure: Task 566
in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in stage
87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
Lorg/apache/zeppelin/spark/ZeppelinContext; at
java.lang.Class.getDeclaredFields0(Native Method) at
java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
java.lang.Class.getDeclaredField(Class.java:2068) ...
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at

Best,

David

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

Hi,

I have tried this example after
https://github.com/apache/incubator-zeppelin/pull/270.

But it is not working for several reasons:

1/ Zeppelin context is not found (!):
val s = z.input("Foo")
<console>:21: error: not found: value z

2/ If I include my jar, the classpath is not communicated to slaves, the
code works only locally (it used to work on the cluster before this
change), I guess there is something wrong with the way I set the classpath
(which is also probably linked to 1/)

I have added this line in zeppelin-env.sh to use one of my jar
export ZEPPELIN_JAVA_OPTS="-Dspark.driver.host=`hostname`
-Dspark.mesos.coarse=true -Dspark.executor.memory=20g -Dspark.cores.max=80
-Dspark.jars=${SOME_JAR} -cp ${SOME_CLASSPATH_FOR_THE_JAR}

How can one add its extraclasspath jar with this new version? Could you add
an ZEPPELIN_EXTRA_JAR, ZEPPELIN_EXTRA_CLASSPATH to zeppelin-env.sh so that
the user can add easily his code?

Best,

David

On Thu, Sep 3, 2015 at 9:39 AM, David Salinas <da...@gmail.com>
wrote:

> Hi Moon,
>
> Thanks for your reactivity, I will notice you of the result as soon as I
> can.
>
> Best,
>
> David
>
> On Wed, Sep 2, 2015 at 6:51 AM, moon soo Lee <mo...@apache.org> wrote:
>
>> Hi,
>>
>> I just pushed patch for ZEPPELIN-262 at
>> https://github.com/apache/incubator-zeppelin/pull/270.
>> It'll take some time to be reviewed and merged into master.
>> Before that, you can try the branch of the PullRequest.
>>
>> I believe it'll solve your problem, but let me know when you still have
>> problem after this patch.
>>
>> Thanks,
>> moon
>>
>> On Tue, Sep 1, 2015 at 2:46 PM moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> I'm testing patch for ZEPPELIN-262 with some environments that i have.
>>> I think i can create pullrequest tonight.
>>>
>>> Thanks,
>>> moon
>>>
>>>
>>> On Tue, Sep 1, 2015 at 1:34 PM Steven Kirtzic <
>>> steven.kirtzic.ffx0@statefarm.com> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>>
>>>>
>>>> When are you guys targeting the release for Zeppelin-262? Thanks,
>>>>
>>>>
>>>>
>>>> -Steven
>>>>
>>>>
>>>>
>>>> *From:* moon soo Lee [mailto:moon@apache.org]
>>>> *Sent:* Tuesday, September 01, 2015 12:38 AM
>>>> *To:* users@zeppelin.incubator.apache.org
>>>> *Subject:* Re: Closure issue with spark 1.4.1
>>>>
>>>>
>>>>
>>>> Hi David, Jerry,
>>>>
>>>>
>>>>
>>>> There're series of efforts to improve spark integration.
>>>>
>>>>
>>>>
>>>> Work with provided version of Spark
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-160
>>>>
>>>>
>>>>
>>>> Self diagnostics of configuration
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-256
>>>>
>>>>
>>>>
>>>> Use spark-submit to run spark interpreter process
>>>>
>>>> https://issues.apache.org/jira/browse/ZEPPELIN-262
>>>>
>>>>
>>>>
>>>> I saw many people struggled with configuring spark in Zeppelin with
>>>> various environments in the mailing list.
>>>>
>>>> ZEPPELIN-262 will virtually solve all the problems around configuration
>>>> with Spark.
>>>>
>>>>
>>>>
>>>> Thanks for sharing your problems and feedback. That enables zeppelin
>>>> make progress.
>>>>
>>>>
>>>>
>>>> Best,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <ch...@gmail.com> wrote:
>>>>
>>>> Hi David,
>>>>
>>>>
>>>>
>>>> We gave up on zeppelin because of the lack of support. It seems that
>>>> zeppelin has a lot of fancy features but lack of depth. Only time will tell
>>>> if zeppelin can overcome those limitations.
>>>>
>>>>
>>>>
>>>> Good luck,
>>>>
>>>>
>>>>
>>>> Jerry
>>>>
>>>>
>>>>
>>>> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Has anyone been able to reproduce the error with the last code snipplet
>>>> I gave? It fails 100% of the time on cluster for me.
>>>> This serialization issue asking for ZeppelinContext comes also in many
>>>> other cases in my setting where it should not be the case as it works fine
>>>> with spark shell.
>>>>
>>>> Best regards,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Zeppelin developers,
>>>>
>>>>
>>>>
>>>> This issue sounds very serious. Is this specific to David's use case
>>>> here?
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>>
>>>>
>>>> Jerry
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>>> issue. Whenever one uses an instruction with z.input("...") something then
>>>> no spark transformation can work as z will be shipped to the slaves where
>>>> Zeppelin is not installed as showed by the example I sent.
>>>>
>>>> A workaround could be to interpret separately the variables (by
>>>> defining a map of variables before interpreting).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>> I found another way to reproduce the problem:
>>>>
>>>> //cell 1 does not work
>>>>
>>>> val file = "hdfs://someclusterfile.json"
>>>> val s = z.input("Foo").toString
>>>> val textFile = sc.textFile(file)
>>>> textFile.filter(_.contains(s)).count
>>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>
>>>> // cell 2 works
>>>>
>>>> val file = "hdfs://someclusterfile.json"
>>>> val s = "Y"
>>>> val textFile = sc.textFile(file)
>>>> textFile.filter(_.contains(s)).count
>>>> //res19: Long = 109
>>>>
>>>> This kind of issue happens often also when using variables from other
>>>> cells and also when taking closure for transformation. Maybe you are
>>>> reading variables inside the transformation with something like
>>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>>> is used (although I also sometimes have this issue without using anything
>>>> from other cells).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>> Sorry I forgot to mention my environment:
>>>>
>>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>>
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>> Hi Moon,
>>>>
>>>>
>>>> Today I cannot reproduce the bug with elementary example either but it
>>>> is still impacting all my notebooks. The weird thing is that when calling a
>>>> transformation with map, it takes Zeppelin Context in the closure which
>>>> gives these java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>> command without any problem). I will try to find another example that is
>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>> closure?
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>> I have tested your code and can not reproduce the problem.
>>>>
>>>>
>>>>
>>>> Could you share your environment? how did you configure Zeppelin with
>>>> Spark?
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> moon
>>>>
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I have a problem when using spark closure. This error was not appearing
>>>> with spark 1.2.1.
>>>>
>>>> I have included a reproducible example that happens when taking the
>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>
>>>> ------------------------------
>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>
>>>> val f = (s: String) => s+s
>>>> textFile.map(f).count
>>>> //works fine
>>>> //res145: Long = 407
>>>>
>>>>
>>>> def f(s:String) = {
>>>>     s+s
>>>> }
>>>> textFile.map(f).count
>>>>
>>>> //fails ->
>>>>
>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>>>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>> at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

Hi Moon,

Thanks for your reactivity, I will notice you of the result as soon as I
can.

Best,

David

On Wed, Sep 2, 2015 at 6:51 AM, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> I just pushed patch for ZEPPELIN-262 at
> https://github.com/apache/incubator-zeppelin/pull/270.
> It'll take some time to be reviewed and merged into master.
> Before that, you can try the branch of the PullRequest.
>
> I believe it'll solve your problem, but let me know when you still have
> problem after this patch.
>
> Thanks,
> moon
>
> On Tue, Sep 1, 2015 at 2:46 PM moon soo Lee <mo...@apache.org> wrote:
>
>> Hi,
>>
>> I'm testing patch for ZEPPELIN-262 with some environments that i have.
>> I think i can create pullrequest tonight.
>>
>> Thanks,
>> moon
>>
>>
>> On Tue, Sep 1, 2015 at 1:34 PM Steven Kirtzic <
>> steven.kirtzic.ffx0@statefarm.com> wrote:
>>
>>> Hi Moon,
>>>
>>>
>>>
>>> When are you guys targeting the release for Zeppelin-262? Thanks,
>>>
>>>
>>>
>>> -Steven
>>>
>>>
>>>
>>> *From:* moon soo Lee [mailto:moon@apache.org]
>>> *Sent:* Tuesday, September 01, 2015 12:38 AM
>>> *To:* users@zeppelin.incubator.apache.org
>>> *Subject:* Re: Closure issue with spark 1.4.1
>>>
>>>
>>>
>>> Hi David, Jerry,
>>>
>>>
>>>
>>> There're series of efforts to improve spark integration.
>>>
>>>
>>>
>>> Work with provided version of Spark
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-160
>>>
>>>
>>>
>>> Self diagnostics of configuration
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-256
>>>
>>>
>>>
>>> Use spark-submit to run spark interpreter process
>>>
>>> https://issues.apache.org/jira/browse/ZEPPELIN-262
>>>
>>>
>>>
>>> I saw many people struggled with configuring spark in Zeppelin with
>>> various environments in the mailing list.
>>>
>>> ZEPPELIN-262 will virtually solve all the problems around configuration
>>> with Spark.
>>>
>>>
>>>
>>> Thanks for sharing your problems and feedback. That enables zeppelin
>>> make progress.
>>>
>>>
>>>
>>> Best,
>>>
>>> moon
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <ch...@gmail.com> wrote:
>>>
>>> Hi David,
>>>
>>>
>>>
>>> We gave up on zeppelin because of the lack of support. It seems that
>>> zeppelin has a lot of fancy features but lack of depth. Only time will tell
>>> if zeppelin can overcome those limitations.
>>>
>>>
>>>
>>> Good luck,
>>>
>>>
>>>
>>> Jerry
>>>
>>>
>>>
>>> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>> Hi all,
>>>
>>> Has anyone been able to reproduce the error with the last code snipplet
>>> I gave? It fails 100% of the time on cluster for me.
>>> This serialization issue asking for ZeppelinContext comes also in many
>>> other cases in my setting where it should not be the case as it works fine
>>> with spark shell.
>>>
>>> Best regards,
>>>
>>> David
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com> wrote:
>>>
>>> Hi Zeppelin developers,
>>>
>>>
>>>
>>> This issue sounds very serious. Is this specific to David's use case
>>> here?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Jerry
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>> issue. Whenever one uses an instruction with z.input("...") something then
>>> no spark transformation can work as z will be shipped to the slaves where
>>> Zeppelin is not installed as showed by the example I sent.
>>>
>>> A workaround could be to interpret separately the variables (by defining
>>> a map of variables before interpreting).
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>> Hi Moon,
>>>
>>> I found another way to reproduce the problem:
>>>
>>> //cell 1 does not work
>>>
>>> val file = "hdfs://someclusterfile.json"
>>> val s = z.input("Foo").toString
>>> val textFile = sc.textFile(file)
>>> textFile.filter(_.contains(s)).count
>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>
>>> // cell 2 works
>>>
>>> val file = "hdfs://someclusterfile.json"
>>> val s = "Y"
>>> val textFile = sc.textFile(file)
>>> textFile.filter(_.contains(s)).count
>>> //res19: Long = 109
>>>
>>> This kind of issue happens often also when using variables from other
>>> cells and also when taking closure for transformation. Maybe you are
>>> reading variables inside the transformation with something like
>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>> is used (although I also sometimes have this issue without using anything
>>> from other cells).
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>> Sorry I forgot to mention my environment:
>>>
>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>
>>>
>>>
>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>> Hi Moon,
>>>
>>>
>>> Today I cannot reproduce the bug with elementary example either but it
>>> is still impacting all my notebooks. The weird thing is that when calling a
>>> transformation with map, it takes Zeppelin Context in the closure which
>>> gives these java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>> command without any problem). I will try to find another example that is
>>> more persistent (it is weird this example was failing yesterday). Do you
>>> have any idea of what could cause Zeppelin Context to be included in the
>>> closure?
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>> I have tested your code and can not reproduce the problem.
>>>
>>>
>>>
>>> Could you share your environment? how did you configure Zeppelin with
>>> Spark?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> moon
>>>
>>>
>>>
>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I have a problem when using spark closure. This error was not appearing
>>> with spark 1.2.1.
>>>
>>> I have included a reproducible example that happens when taking the
>>> closure (Zeppelin has been built with head of master with this command mvn
>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>> encountered this problem? All my previous notebooks are broken by this :(
>>>
>>> ------------------------------
>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>
>>> val f = (s: String) => s+s
>>> textFile.map(f).count
>>> //works fine
>>> //res145: Long = 407
>>>
>>>
>>> def f(s:String) = {
>>>     s+s
>>> }
>>> textFile.map(f).count
>>>
>>> //fails ->
>>>
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

Re: Closure issue with spark 1.4.1

Posted by moon soo Lee <mo...@apache.org>.

Hi,

I just pushed patch for ZEPPELIN-262 at
https://github.com/apache/incubator-zeppelin/pull/270.
It'll take some time to be reviewed and merged into master.
Before that, you can try the branch of the PullRequest.

I believe it'll solve your problem, but let me know when you still have
problem after this patch.

Thanks,
moon

On Tue, Sep 1, 2015 at 2:46 PM moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> I'm testing patch for ZEPPELIN-262 with some environments that i have.
> I think i can create pullrequest tonight.
>
> Thanks,
> moon
>
>
> On Tue, Sep 1, 2015 at 1:34 PM Steven Kirtzic <
> steven.kirtzic.ffx0@statefarm.com> wrote:
>
>> Hi Moon,
>>
>>
>>
>> When are you guys targeting the release for Zeppelin-262? Thanks,
>>
>>
>>
>> -Steven
>>
>>
>>
>> *From:* moon soo Lee [mailto:moon@apache.org]
>> *Sent:* Tuesday, September 01, 2015 12:38 AM
>> *To:* users@zeppelin.incubator.apache.org
>> *Subject:* Re: Closure issue with spark 1.4.1
>>
>>
>>
>> Hi David, Jerry,
>>
>>
>>
>> There're series of efforts to improve spark integration.
>>
>>
>>
>> Work with provided version of Spark
>>
>> https://issues.apache.org/jira/browse/ZEPPELIN-160
>>
>>
>>
>> Self diagnostics of configuration
>>
>> https://issues.apache.org/jira/browse/ZEPPELIN-256
>>
>>
>>
>> Use spark-submit to run spark interpreter process
>>
>> https://issues.apache.org/jira/browse/ZEPPELIN-262
>>
>>
>>
>> I saw many people struggled with configuring spark in Zeppelin with
>> various environments in the mailing list.
>>
>> ZEPPELIN-262 will virtually solve all the problems around configuration
>> with Spark.
>>
>>
>>
>> Thanks for sharing your problems and feedback. That enables zeppelin make
>> progress.
>>
>>
>>
>> Best,
>>
>> moon
>>
>>
>>
>> On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <ch...@gmail.com> wrote:
>>
>> Hi David,
>>
>>
>>
>> We gave up on zeppelin because of the lack of support. It seems that
>> zeppelin has a lot of fancy features but lack of depth. Only time will tell
>> if zeppelin can overcome those limitations.
>>
>>
>>
>> Good luck,
>>
>>
>>
>> Jerry
>>
>>
>>
>> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>> Hi all,
>>
>> Has anyone been able to reproduce the error with the last code snipplet I
>> gave? It fails 100% of the time on cluster for me.
>> This serialization issue asking for ZeppelinContext comes also in many
>> other cases in my setting where it should not be the case as it works fine
>> with spark shell.
>>
>> Best regards,
>>
>> David
>>
>>
>>
>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com> wrote:
>>
>> Hi Zeppelin developers,
>>
>>
>>
>> This issue sounds very serious. Is this specific to David's use case here?
>>
>>
>>
>> Best Regards,
>>
>>
>>
>> Jerry
>>
>>
>>
>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>> I have looked at the SparkInterpreter.java code and this is indeed the
>> issue. Whenever one uses an instruction with z.input("...") something then
>> no spark transformation can work as z will be shipped to the slaves where
>> Zeppelin is not installed as showed by the example I sent.
>>
>> A workaround could be to interpret separately the variables (by defining
>> a map of variables before interpreting).
>>
>> Best,
>>
>> David
>>
>>
>>
>>
>>
>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>> Hi Moon,
>>
>> I found another way to reproduce the problem:
>>
>> //cell 1 does not work
>>
>> val file = "hdfs://someclusterfile.json"
>> val s = z.input("Foo").toString
>> val textFile = sc.textFile(file)
>> textFile.filter(_.contains(s)).count
>> //org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>
>> // cell 2 works
>>
>> val file = "hdfs://someclusterfile.json"
>> val s = "Y"
>> val textFile = sc.textFile(file)
>> textFile.filter(_.contains(s)).count
>> //res19: Long = 109
>>
>> This kind of issue happens often also when using variables from other
>> cells and also when taking closure for transformation. Maybe you are
>> reading variables inside the transformation with something like
>> "z.get("s")" which causes z to be send to the slaves as one of its member
>> is used (although I also sometimes have this issue without using anything
>> from other cells).
>>
>> Best,
>>
>> David
>>
>>
>>
>>
>>
>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>> Sorry I forgot to mention my environment:
>>
>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>
>>
>>
>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>> Hi Moon,
>>
>>
>> Today I cannot reproduce the bug with elementary example either but it is
>> still impacting all my notebooks. The weird thing is that when calling a
>> transformation with map, it takes Zeppelin Context in the closure which
>> gives these java.lang.NoClassDefFoundError:
>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>> command without any problem). I will try to find another example that is
>> more persistent (it is weird this example was failing yesterday). Do you
>> have any idea of what could cause Zeppelin Context to be included in the
>> closure?
>>
>> Best,
>>
>> David
>>
>>
>>
>>
>>
>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>> I have tested your code and can not reproduce the problem.
>>
>>
>>
>> Could you share your environment? how did you configure Zeppelin with
>> Spark?
>>
>>
>>
>> Thanks,
>>
>> moon
>>
>>
>>
>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>> Hi,
>>
>> I have a problem when using spark closure. This error was not appearing
>> with spark 1.2.1.
>>
>> I have included a reproducible example that happens when taking the
>> closure (Zeppelin has been built with head of master with this command mvn
>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>> encountered this problem? All my previous notebooks are broken by this :(
>>
>> ------------------------------
>> val textFile = sc.textFile("hdfs://somefile.txt")
>>
>> val f = (s: String) => s+s
>> textFile.map(f).count
>> //works fine
>> //res145: Long = 407
>>
>>
>> def f(s:String) = {
>>     s+s
>> }
>> textFile.map(f).count
>>
>> //fails ->
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>> java.lang.Class.getDeclaredFields0(Native Method) at
>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>
>> Best,
>>
>> David
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: Closure issue with spark 1.4.1

Posted by moon soo Lee <mo...@apache.org>.

Hi,

I'm testing patch for ZEPPELIN-262 with some environments that i have.
I think i can create pullrequest tonight.

Thanks,
moon

On Tue, Sep 1, 2015 at 1:34 PM Steven Kirtzic <
steven.kirtzic.ffx0@statefarm.com> wrote:

> Hi Moon,
>
>
>
> When are you guys targeting the release for Zeppelin-262? Thanks,
>
>
>
> -Steven
>
>
>
> *From:* moon soo Lee [mailto:moon@apache.org]
> *Sent:* Tuesday, September 01, 2015 12:38 AM
> *To:* users@zeppelin.incubator.apache.org
> *Subject:* Re: Closure issue with spark 1.4.1
>
>
>
> Hi David, Jerry,
>
>
>
> There're series of efforts to improve spark integration.
>
>
>
> Work with provided version of Spark
>
> https://issues.apache.org/jira/browse/ZEPPELIN-160
>
>
>
> Self diagnostics of configuration
>
> https://issues.apache.org/jira/browse/ZEPPELIN-256
>
>
>
> Use spark-submit to run spark interpreter process
>
> https://issues.apache.org/jira/browse/ZEPPELIN-262
>
>
>
> I saw many people struggled with configuring spark in Zeppelin with
> various environments in the mailing list.
>
> ZEPPELIN-262 will virtually solve all the problems around configuration
> with Spark.
>
>
>
> Thanks for sharing your problems and feedback. That enables zeppelin make
> progress.
>
>
>
> Best,
>
> moon
>
>
>
> On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <ch...@gmail.com> wrote:
>
> Hi David,
>
>
>
> We gave up on zeppelin because of the lack of support. It seems that
> zeppelin has a lot of fancy features but lack of depth. Only time will tell
> if zeppelin can overcome those limitations.
>
>
>
> Good luck,
>
>
>
> Jerry
>
>
>
> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
> Hi all,
>
> Has anyone been able to reproduce the error with the last code snipplet I
> gave? It fails 100% of the time on cluster for me.
> This serialization issue asking for ZeppelinContext comes also in many
> other cases in my setting where it should not be the case as it works fine
> with spark shell.
>
> Best regards,
>
> David
>
>
>
> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> Hi Zeppelin developers,
>
>
>
> This issue sounds very serious. Is this specific to David's use case here?
>
>
>
> Best Regards,
>
>
>
> Jerry
>
>
>
> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
> I have looked at the SparkInterpreter.java code and this is indeed the
> issue. Whenever one uses an instruction with z.input("...") something then
> no spark transformation can work as z will be shipped to the slaves where
> Zeppelin is not installed as showed by the example I sent.
>
> A workaround could be to interpret separately the variables (by defining a
> map of variables before interpreting).
>
> Best,
>
> David
>
>
>
>
>
> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
> Hi Moon,
>
> I found another way to reproduce the problem:
>
> //cell 1 does not work
>
> val file = "hdfs://someclusterfile.json"
> val s = z.input("Foo").toString
> val textFile = sc.textFile(file)
> textFile.filter(_.contains(s)).count
> //org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
> Lorg/apache/zeppelin/spark/ZeppelinContext;
>
> // cell 2 works
>
> val file = "hdfs://someclusterfile.json"
> val s = "Y"
> val textFile = sc.textFile(file)
> textFile.filter(_.contains(s)).count
> //res19: Long = 109
>
> This kind of issue happens often also when using variables from other
> cells and also when taking closure for transformation. Maybe you are
> reading variables inside the transformation with something like
> "z.get("s")" which causes z to be send to the slaves as one of its member
> is used (although I also sometimes have this issue without using anything
> from other cells).
>
> Best,
>
> David
>
>
>
>
>
> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
> Sorry I forgot to mention my environment:
>
> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>
>
>
> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
> Hi Moon,
>
>
> Today I cannot reproduce the bug with elementary example either but it is
> still impacting all my notebooks. The weird thing is that when calling a
> transformation with map, it takes Zeppelin Context in the closure which
> gives these java.lang.NoClassDefFoundError:
> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
> command without any problem). I will try to find another example that is
> more persistent (it is weird this example was failing yesterday). Do you
> have any idea of what could cause Zeppelin Context to be included in the
> closure?
>
> Best,
>
> David
>
>
>
>
>
> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>
> I have tested your code and can not reproduce the problem.
>
>
>
> Could you share your environment? how did you configure Zeppelin with
> Spark?
>
>
>
> Thanks,
>
> moon
>
>
>
> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <da...@gmail.com>
> wrote:
>
> Hi,
>
> I have a problem when using spark closure. This error was not appearing
> with spark 1.2.1.
>
> I have included a reproducible example that happens when taking the
> closure (Zeppelin has been built with head of master with this command mvn
> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
> encountered this problem? All my previous notebooks are broken by this :(
>
> ------------------------------
> val textFile = sc.textFile("hdfs://somefile.txt")
>
> val f = (s: String) => s+s
> textFile.map(f).count
> //works fine
> //res145: Long = 407
>
>
> def f(s:String) = {
>     s+s
> }
> textFile.map(f).count
>
> //fails ->
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
> Lorg/apache/zeppelin/spark/ZeppelinContext; at
> java.lang.Class.getDeclaredFields0(Native Method) at
> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
> java.lang.Class.getDeclaredField(Class.java:2068) ...
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>
> Best,
>
> David
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

RE: Closure issue with spark 1.4.1

Posted by Steven Kirtzic <st...@statefarm.com>.

Hi Moon,

When are you guys targeting the release for Zeppelin-262? Thanks,

-Steven

From: moon soo Lee [mailto:moon@apache.org]
Sent: Tuesday, September 01, 2015 12:38 AM
To: users@zeppelin.incubator.apache.org
Subject: Re: Closure issue with spark 1.4.1

Hi David, Jerry,

There're series of efforts to improve spark integration.

Work with provided version of Spark
https://issues.apache.org/jira/browse/ZEPPELIN-160

Self diagnostics of configuration
https://issues.apache.org/jira/browse/ZEPPELIN-256

Use spark-submit to run spark interpreter process
https://issues.apache.org/jira/browse/ZEPPELIN-262

I saw many people struggled with configuring spark in Zeppelin with various environments in the mailing list.
ZEPPELIN-262 will virtually solve all the problems around configuration with Spark.

Thanks for sharing your problems and feedback. That enables zeppelin make progress.

Best,
moon

On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <ch...@gmail.com>> wrote:
Hi David,

We gave up on zeppelin because of the lack of support. It seems that zeppelin has a lot of fancy features but lack of depth. Only time will tell if zeppelin can overcome those limitations.

Good luck,

Jerry

On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <da...@gmail.com>> wrote:
Hi all,
Has anyone been able to reproduce the error with the last code snipplet I gave? It fails 100% of the time on cluster for me.
This serialization issue asking for ZeppelinContext comes also in many other cases in my setting where it should not be the case as it works fine with spark shell.
Best regards,
David

On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com>> wrote:
Hi Zeppelin developers,

This issue sounds very serious. Is this specific to David's use case here?

Best Regards,

Jerry

On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <da...@gmail.com>> wrote:
I have looked at the SparkInterpreter.java code and this is indeed the issue. Whenever one uses an instruction with z.input("...") something then no spark transformation can work as z will be shipped to the slaves where Zeppelin is not installed as showed by the example I sent.
A workaround could be to interpret separately the variables (by defining a map of variables before interpreting).
Best,
David

On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <da...@gmail.com>> wrote:
Hi Moon,
I found another way to reproduce the problem:
//cell 1 does not work
val file = "hdfs://someclusterfile.json"
val s = z.input("Foo").toString
val textFile = sc.textFile(file)
textFile.filter(_.contains(s)).count
//org.apache.spark.SparkException: Job aborted due to stage failure: Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
// cell 2 works
val file = "hdfs://someclusterfile.json"
val s = "Y"
val textFile = sc.textFile(file)
textFile.filter(_.contains(s)).count
//res19: Long = 109
This kind of issue happens often also when using variables from other cells and also when taking closure for transformation. Maybe you are reading variables inside the transformation with something like "z.get("s")" which causes z to be send to the slaves as one of its member is used (although I also sometimes have this issue without using anything from other cells).
Best,
David

On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <da...@gmail.com>> wrote:
Sorry I forgot to mention my environment:
mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8

On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <da...@gmail.com>> wrote:
Hi Moon,

Today I cannot reproduce the bug with elementary example either but it is still impacting all my notebooks. The weird thing is that when calling a transformation with map, it takes Zeppelin Context in the closure which gives these java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this command without any problem). I will try to find another example that is more persistent (it is weird this example was failing yesterday). Do you have any idea of what could cause Zeppelin Context to be included in the closure?
Best,
David

On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org>> wrote:
I have tested your code and can not reproduce the problem.

Could you share your environment? how did you configure Zeppelin with Spark?

Thanks,
moon

On Fri, Aug 21, 2015 at 2:25 AM David Salinas <da...@gmail.com>> wrote:
Hi,

I have a problem when using spark closure. This error was not appearing with spark 1.2.1.

I have included a reproducible example that happens when taking the closure (Zeppelin has been built with head of master with this command mvn install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1 -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever encountered this problem? All my previous notebooks are broken by this :(

------------------------------
val textFile = sc.textFile("hdfs://somefile.txt")

val f = (s: String) => s+s
textFile.map(f).count
//works fine
//res145: Long = 407

def f(s:String) = {
    s+s
}
textFile.map(f).count

//fails ->

org.apache.spark.SparkException: Job aborted due to stage failure: Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext; at java.lang.Class.getDeclaredFields0(Native Method) at java.lang.Class.privateGetDeclaredFields(Class.java:2583) at java.lang.Class.getDeclaredField(Class.java:2068) ...
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at

Best,

David

Re: Closure issue with spark 1.4.1

Posted by moon soo Lee <mo...@apache.org>.

Hi David, Jerry,

There're series of efforts to improve spark integration.

Work with provided version of Spark
https://issues.apache.org/jira/browse/ZEPPELIN-160

Self diagnostics of configuration
https://issues.apache.org/jira/browse/ZEPPELIN-256

Use spark-submit to run spark interpreter process
https://issues.apache.org/jira/browse/ZEPPELIN-262

I saw many people struggled with configuring spark in Zeppelin with various
environments in the mailing list.
ZEPPELIN-262 will virtually solve all the problems around configuration
with Spark.

Thanks for sharing your problems and feedback. That enables zeppelin make
progress.

Best,
moon

On Mon, Aug 31, 2015 at 9:17 PM Jerry Lam <ch...@gmail.com> wrote:

> Hi David,
>
> We gave up on zeppelin because of the lack of support. It seems that
> zeppelin has a lot of fancy features but lack of depth. Only time will tell
> if zeppelin can overcome those limitations.
>
> Good luck,
>
> Jerry
>
> On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
>> Hi all,
>>
>> Has anyone been able to reproduce the error with the last code snipplet I
>> gave? It fails 100% of the time on cluster for me.
>> This serialization issue asking for ZeppelinContext comes also in many
>> other cases in my setting where it should not be the case as it works fine
>> with spark shell.
>>
>> Best regards,
>>
>> David
>>
>> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com> wrote:
>>
>>> Hi Zeppelin developers,
>>>
>>> This issue sounds very serious. Is this specific to David's use case
>>> here?
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>>> issue. Whenever one uses an instruction with z.input("...") something then
>>>> no spark transformation can work as z will be shipped to the slaves where
>>>> Zeppelin is not installed as showed by the example I sent.
>>>> A workaround could be to interpret separately the variables (by
>>>> defining a map of variables before interpreting).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>>> Hi Moon,
>>>>>
>>>>> I found another way to reproduce the problem:
>>>>>
>>>>> //cell 1 does not work
>>>>> val file = "hdfs://someclusterfile.json"
>>>>> val s = z.input("Foo").toString
>>>>> val textFile = sc.textFile(file)
>>>>> textFile.filter(_.contains(s)).count
>>>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>>
>>>>> // cell 2 works
>>>>> val file = "hdfs://someclusterfile.json"
>>>>> val s = "Y"
>>>>> val textFile = sc.textFile(file)
>>>>> textFile.filter(_.contains(s)).count
>>>>> //res19: Long = 109
>>>>>
>>>>> This kind of issue happens often also when using variables from other
>>>>> cells and also when taking closure for transformation. Maybe you are
>>>>> reading variables inside the transformation with something like
>>>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>>>> is used (although I also sometimes have this issue without using anything
>>>>> from other cells).
>>>>>
>>>>> Best,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>
>>>>>> Sorry I forgot to mention my environment:
>>>>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>>>>
>>>>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Moon,
>>>>>>>
>>>>>>> Today I cannot reproduce the bug with elementary example either but
>>>>>>> it is still impacting all my notebooks. The weird thing is that when
>>>>>>> calling a transformation with map, it takes Zeppelin Context in the closure
>>>>>>> which gives these java.lang.NoClassDefFoundError:
>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>>>>> command without any problem). I will try to find another example that is
>>>>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>>>>> closure?
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have tested your code and can not reproduce the problem.
>>>>>>>>
>>>>>>>> Could you share your environment? how did you configure Zeppelin
>>>>>>>> with Spark?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> moon
>>>>>>>>
>>>>>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a problem when using spark closure. This error was not
>>>>>>>>> appearing with spark 1.2.1.
>>>>>>>>>
>>>>>>>>> I have included a reproducible example that happens when taking
>>>>>>>>> the closure (Zeppelin has been built with head of master with this command
>>>>>>>>> mvn install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>>>>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>>>>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>>>>>>
>>>>>>>>> val f = (s: String) => s+s
>>>>>>>>> textFile.map(f).count
>>>>>>>>> //works fine
>>>>>>>>> //res145: Long = 407
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> def f(s:String) = {
>>>>>>>>>     s+s
>>>>>>>>> }
>>>>>>>>> textFile.map(f).count
>>>>>>>>>
>>>>>>>>> //fails ->
>>>>>>>>>
>>>>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>>>>> Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3
>>>>>>>>> in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>>>>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>>>>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>>>>> at
>>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by Jerry Lam <ch...@gmail.com>.

Hi David,

We gave up on zeppelin because of the lack of support. It seems that
zeppelin has a lot of fancy features but lack of depth. Only time will tell
if zeppelin can overcome those limitations.

Good luck,

Jerry

On Mon, Aug 31, 2015 at 8:17 AM, David Salinas <da...@gmail.com>
wrote:

> Hi all,
>
> Has anyone been able to reproduce the error with the last code snipplet I
> gave? It fails 100% of the time on cluster for me.
> This serialization issue asking for ZeppelinContext comes also in many
> other cases in my setting where it should not be the case as it works fine
> with spark shell.
>
> Best regards,
>
> David
>
> On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com> wrote:
>
>> Hi Zeppelin developers,
>>
>> This issue sounds very serious. Is this specific to David's use case here?
>>
>> Best Regards,
>>
>> Jerry
>>
>> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>>> I have looked at the SparkInterpreter.java code and this is indeed the
>>> issue. Whenever one uses an instruction with z.input("...") something then
>>> no spark transformation can work as z will be shipped to the slaves where
>>> Zeppelin is not installed as showed by the example I sent.
>>> A workaround could be to interpret separately the variables (by defining
>>> a map of variables before interpreting).
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>> I found another way to reproduce the problem:
>>>>
>>>> //cell 1 does not work
>>>> val file = "hdfs://someclusterfile.json"
>>>> val s = z.input("Foo").toString
>>>> val textFile = sc.textFile(file)
>>>> textFile.filter(_.contains(s)).count
>>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>>
>>>> // cell 2 works
>>>> val file = "hdfs://someclusterfile.json"
>>>> val s = "Y"
>>>> val textFile = sc.textFile(file)
>>>> textFile.filter(_.contains(s)).count
>>>> //res19: Long = 109
>>>>
>>>> This kind of issue happens often also when using variables from other
>>>> cells and also when taking closure for transformation. Maybe you are
>>>> reading variables inside the transformation with something like
>>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>>> is used (although I also sometimes have this issue without using anything
>>>> from other cells).
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>>> Sorry I forgot to mention my environment:
>>>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>>>
>>>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>
>>>>>> Hi Moon,
>>>>>>
>>>>>> Today I cannot reproduce the bug with elementary example either but
>>>>>> it is still impacting all my notebooks. The weird thing is that when
>>>>>> calling a transformation with map, it takes Zeppelin Context in the closure
>>>>>> which gives these java.lang.NoClassDefFoundError:
>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>>>> command without any problem). I will try to find another example that is
>>>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>>>> closure?
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> I have tested your code and can not reproduce the problem.
>>>>>>>
>>>>>>> Could you share your environment? how did you configure Zeppelin
>>>>>>> with Spark?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> moon
>>>>>>>
>>>>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a problem when using spark closure. This error was not
>>>>>>>> appearing with spark 1.2.1.
>>>>>>>>
>>>>>>>> I have included a reproducible example that happens when taking the
>>>>>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>>>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>>>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>>>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>>>>>
>>>>>>>> val f = (s: String) => s+s
>>>>>>>> textFile.map(f).count
>>>>>>>> //works fine
>>>>>>>> //res145: Long = 407
>>>>>>>>
>>>>>>>>
>>>>>>>> def f(s:String) = {
>>>>>>>>     s+s
>>>>>>>> }
>>>>>>>> textFile.map(f).count
>>>>>>>>
>>>>>>>> //fails ->
>>>>>>>>
>>>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>>>> Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3
>>>>>>>> in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>>>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>>>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>>>> at
>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

Hi all,

Has anyone been able to reproduce the error with the last code snipplet I
gave? It fails 100% of the time on cluster for me.
This serialization issue asking for ZeppelinContext comes also in many
other cases in my setting where it should not be the case as it works fine
with spark shell.

Best regards,

David

On Mon, Aug 24, 2015 at 9:07 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Zeppelin developers,
>
> This issue sounds very serious. Is this specific to David's use case here?
>
> Best Regards,
>
> Jerry
>
> On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
>> I have looked at the SparkInterpreter.java code and this is indeed the
>> issue. Whenever one uses an instruction with z.input("...") something then
>> no spark transformation can work as z will be shipped to the slaves where
>> Zeppelin is not installed as showed by the example I sent.
>> A workaround could be to interpret separately the variables (by defining
>> a map of variables before interpreting).
>>
>> Best,
>>
>> David
>>
>>
>> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>>> Hi Moon,
>>>
>>> I found another way to reproduce the problem:
>>>
>>> //cell 1 does not work
>>> val file = "hdfs://someclusterfile.json"
>>> val s = z.input("Foo").toString
>>> val textFile = sc.textFile(file)
>>> textFile.filter(_.contains(s)).count
>>> //org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>>
>>> // cell 2 works
>>> val file = "hdfs://someclusterfile.json"
>>> val s = "Y"
>>> val textFile = sc.textFile(file)
>>> textFile.filter(_.contains(s)).count
>>> //res19: Long = 109
>>>
>>> This kind of issue happens often also when using variables from other
>>> cells and also when taking closure for transformation. Maybe you are
>>> reading variables inside the transformation with something like
>>> "z.get("s")" which causes z to be send to the slaves as one of its member
>>> is used (although I also sometimes have this issue without using anything
>>> from other cells).
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>>> Sorry I forgot to mention my environment:
>>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>>
>>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>>> Hi Moon,
>>>>>
>>>>> Today I cannot reproduce the bug with elementary example either but it
>>>>> is still impacting all my notebooks. The weird thing is that when calling a
>>>>> transformation with map, it takes Zeppelin Context in the closure which
>>>>> gives these java.lang.NoClassDefFoundError:
>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>>> command without any problem). I will try to find another example that is
>>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>>> closure?
>>>>>
>>>>> Best,
>>>>>
>>>>> David
>>>>>
>>>>>
>>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>>
>>>>>> I have tested your code and can not reproduce the problem.
>>>>>>
>>>>>> Could you share your environment? how did you configure Zeppelin with
>>>>>> Spark?
>>>>>>
>>>>>> Thanks,
>>>>>> moon
>>>>>>
>>>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have a problem when using spark closure. This error was not
>>>>>>> appearing with spark 1.2.1.
>>>>>>>
>>>>>>> I have included a reproducible example that happens when taking the
>>>>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>>>>
>>>>>>> val f = (s: String) => s+s
>>>>>>> textFile.map(f).count
>>>>>>> //works fine
>>>>>>> //res145: Long = 407
>>>>>>>
>>>>>>>
>>>>>>> def f(s:String) = {
>>>>>>>     s+s
>>>>>>> }
>>>>>>> textFile.map(f).count
>>>>>>>
>>>>>>> //fails ->
>>>>>>>
>>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>>> Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3
>>>>>>> in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>>> at
>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by Jerry Lam <ch...@gmail.com>.

Hi Zeppelin developers,

This issue sounds very serious. Is this specific to David's use case here?

Best Regards,

Jerry

On Mon, Aug 24, 2015 at 1:28 PM, David Salinas <da...@gmail.com>
wrote:

> I have looked at the SparkInterpreter.java code and this is indeed the
> issue. Whenever one uses an instruction with z.input("...") something then
> no spark transformation can work as z will be shipped to the slaves where
> Zeppelin is not installed as showed by the example I sent.
> A workaround could be to interpret separately the variables (by defining a
> map of variables before interpreting).
>
> Best,
>
> David
>
>
> On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
>> Hi Moon,
>>
>> I found another way to reproduce the problem:
>>
>> //cell 1 does not work
>> val file = "hdfs://someclusterfile.json"
>> val s = z.input("Foo").toString
>> val textFile = sc.textFile(file)
>> textFile.filter(_.contains(s)).count
>> //org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
>> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
>> Lorg/apache/zeppelin/spark/ZeppelinContext;
>>
>> // cell 2 works
>> val file = "hdfs://someclusterfile.json"
>> val s = "Y"
>> val textFile = sc.textFile(file)
>> textFile.filter(_.contains(s)).count
>> //res19: Long = 109
>>
>> This kind of issue happens often also when using variables from other
>> cells and also when taking closure for transformation. Maybe you are
>> reading variables inside the transformation with something like
>> "z.get("s")" which causes z to be send to the slaves as one of its member
>> is used (although I also sometimes have this issue without using anything
>> from other cells).
>>
>> Best,
>>
>> David
>>
>>
>> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>>> Sorry I forgot to mention my environment:
>>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>>
>>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>>> Hi Moon,
>>>>
>>>> Today I cannot reproduce the bug with elementary example either but it
>>>> is still impacting all my notebooks. The weird thing is that when calling a
>>>> transformation with map, it takes Zeppelin Context in the closure which
>>>> gives these java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>>> command without any problem). I will try to find another example that is
>>>> more persistent (it is weird this example was failing yesterday). Do you
>>>> have any idea of what could cause Zeppelin Context to be included in the
>>>> closure?
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>
>>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> I have tested your code and can not reproduce the problem.
>>>>>
>>>>> Could you share your environment? how did you configure Zeppelin with
>>>>> Spark?
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>>> david.salinas.pro@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a problem when using spark closure. This error was not
>>>>>> appearing with spark 1.2.1.
>>>>>>
>>>>>> I have included a reproducible example that happens when taking the
>>>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>>>
>>>>>> ------------------------------
>>>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>>>
>>>>>> val f = (s: String) => s+s
>>>>>> textFile.map(f).count
>>>>>> //works fine
>>>>>> //res145: Long = 407
>>>>>>
>>>>>>
>>>>>> def f(s:String) = {
>>>>>>     s+s
>>>>>> }
>>>>>> textFile.map(f).count
>>>>>>
>>>>>> //fails ->
>>>>>>
>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>> Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3
>>>>>> in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>>> at
>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> David
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

I have looked at the SparkInterpreter.java code and this is indeed the
issue. Whenever one uses an instruction with z.input("...") something then
no spark transformation can work as z will be shipped to the slaves where
Zeppelin is not installed as showed by the example I sent.
A workaround could be to interpret separately the variables (by defining a
map of variables before interpreting).

Best,

David


On Mon, Aug 24, 2015 at 6:45 PM, David Salinas <da...@gmail.com>
wrote:

> Hi Moon,
>
> I found another way to reproduce the problem:
>
> //cell 1 does not work
> val file = "hdfs://someclusterfile.json"
> val s = z.input("Foo").toString
> val textFile = sc.textFile(file)
> textFile.filter(_.contains(s)).count
> //org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
> stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
> Lorg/apache/zeppelin/spark/ZeppelinContext;
>
> // cell 2 works
> val file = "hdfs://someclusterfile.json"
> val s = "Y"
> val textFile = sc.textFile(file)
> textFile.filter(_.contains(s)).count
> //res19: Long = 109
>
> This kind of issue happens often also when using variables from other
> cells and also when taking closure for transformation. Maybe you are
> reading variables inside the transformation with something like
> "z.get("s")" which causes z to be send to the slaves as one of its member
> is used (although I also sometimes have this issue without using anything
> from other cells).
>
> Best,
>
> David
>
>
> On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
>> Sorry I forgot to mention my environment:
>> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>>
>> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>>> Hi Moon,
>>>
>>> Today I cannot reproduce the bug with elementary example either but it
>>> is still impacting all my notebooks. The weird thing is that when calling a
>>> transformation with map, it takes Zeppelin Context in the closure which
>>> gives these java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>>> command without any problem). I will try to find another example that is
>>> more persistent (it is weird this example was failing yesterday). Do you
>>> have any idea of what could cause Zeppelin Context to be included in the
>>> closure?
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> I have tested your code and can not reproduce the problem.
>>>>
>>>> Could you share your environment? how did you configure Zeppelin with
>>>> Spark?
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>>> david.salinas.pro@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a problem when using spark closure. This error was not
>>>>> appearing with spark 1.2.1.
>>>>>
>>>>> I have included a reproducible example that happens when taking the
>>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>>
>>>>> ------------------------------
>>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>>
>>>>> val f = (s: String) => s+s
>>>>> textFile.map(f).count
>>>>> //works fine
>>>>> //res145: Long = 407
>>>>>
>>>>>
>>>>> def f(s:String) = {
>>>>>     s+s
>>>>> }
>>>>> textFile.map(f).count
>>>>>
>>>>> //fails ->
>>>>>
>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>> Task 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3
>>>>> in stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>>> at
>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>>
>>>>> Best,
>>>>>
>>>>> David
>>>>>
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

Hi Moon,

I found another way to reproduce the problem:

//cell 1 does not work
val file = "hdfs://someclusterfile.json"
val s = z.input("Foo").toString
val textFile = sc.textFile(file)
textFile.filter(_.contains(s)).count
//org.apache.spark.SparkException: Job aborted due to stage failure: Task
41 in stage 5.0 failed 4 times, most recent failure: Lost task 41.3 in
stage 5.0 (TID 2735,XXX.com ): java.lang.NoClassDefFoundError:
Lorg/apache/zeppelin/spark/ZeppelinContext;

// cell 2 works
val file = "hdfs://someclusterfile.json"
val s = "Y"
val textFile = sc.textFile(file)
textFile.filter(_.contains(s)).count
//res19: Long = 109

This kind of issue happens often also when using variables from other cells
and also when taking closure for transformation. Maybe you are reading
variables inside the transformation with something like "z.get("s")" which
causes z to be send to the slaves as one of its member is used (although I
also sometimes have this issue without using anything from other cells).

Best,

David


On Mon, Aug 24, 2015 at 10:34 AM, David Salinas <david.salinas.pro@gmail.com
> wrote:

> Sorry I forgot to mention my environment:
> mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8
>
> On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <
> david.salinas.pro@gmail.com> wrote:
>
>> Hi Moon,
>>
>> Today I cannot reproduce the bug with elementary example either but it is
>> still impacting all my notebooks. The weird thing is that when calling a
>> transformation with map, it takes Zeppelin Context in the closure which
>> gives these java.lang.NoClassDefFoundError:
>> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
>> command without any problem). I will try to find another example that is
>> more persistent (it is weird this example was failing yesterday). Do you
>> have any idea of what could cause Zeppelin Context to be included in the
>> closure?
>>
>> Best,
>>
>> David
>>
>>
>> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>>
>>> I have tested your code and can not reproduce the problem.
>>>
>>> Could you share your environment? how did you configure Zeppelin with
>>> Spark?
>>>
>>> Thanks,
>>> moon
>>>
>>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>>> david.salinas.pro@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a problem when using spark closure. This error was not appearing
>>>> with spark 1.2.1.
>>>>
>>>> I have included a reproducible example that happens when taking the
>>>> closure (Zeppelin has been built with head of master with this command mvn
>>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>>> encountered this problem? All my previous notebooks are broken by this :(
>>>>
>>>> ------------------------------
>>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>>
>>>> val f = (s: String) => s+s
>>>> textFile.map(f).count
>>>> //works fine
>>>> //res145: Long = 407
>>>>
>>>>
>>>> def f(s:String) = {
>>>>     s+s
>>>> }
>>>> textFile.map(f).count
>>>>
>>>> //fails ->
>>>>
>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>>>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>>>> at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

Sorry I forgot to mention my environment:
mesos 0.17, spark 1.4.1, scala 2.10.4, java 1.8

On Mon, Aug 24, 2015 at 10:32 AM, David Salinas <david.salinas.pro@gmail.com
> wrote:

> Hi Moon,
>
> Today I cannot reproduce the bug with elementary example either but it is
> still impacting all my notebooks. The weird thing is that when calling a
> transformation with map, it takes Zeppelin Context in the closure which
> gives these java.lang.NoClassDefFoundError:
> Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
> command without any problem). I will try to find another example that is
> more persistent (it is weird this example was failing yesterday). Do you
> have any idea of what could cause Zeppelin Context to be included in the
> closure?
>
> Best,
>
> David
>
>
> On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:
>
>> I have tested your code and can not reproduce the problem.
>>
>> Could you share your environment? how did you configure Zeppelin with
>> Spark?
>>
>> Thanks,
>> moon
>>
>> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <
>> david.salinas.pro@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have a problem when using spark closure. This error was not appearing
>>> with spark 1.2.1.
>>>
>>> I have included a reproducible example that happens when taking the
>>> closure (Zeppelin has been built with head of master with this command mvn
>>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>>> encountered this problem? All my previous notebooks are broken by this :(
>>>
>>> ------------------------------
>>> val textFile = sc.textFile("hdfs://somefile.txt")
>>>
>>> val f = (s: String) => s+s
>>> textFile.map(f).count
>>> //works fine
>>> //res145: Long = 407
>>>
>>>
>>> def f(s:String) = {
>>>     s+s
>>> }
>>> textFile.map(f).count
>>>
>>> //fails ->
>>>
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>>> java.lang.Class.getDeclaredFields0(Native Method) at
>>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>>
>>> Best,
>>>
>>> David
>>>
>>
>

Re: Closure issue with spark 1.4.1

Posted by David Salinas <da...@gmail.com>.

Hi Moon,

Today I cannot reproduce the bug with elementary example either but it is
still impacting all my notebooks. The weird thing is that when calling a
transformation with map, it takes Zeppelin Context in the closure which
gives these java.lang.NoClassDefFoundError:
Lorg/apache/zeppelin/spark/ZeppelinContext errors (spark shell run this
command without any problem). I will try to find another example that is
more persistent (it is weird this example was failing yesterday). Do you
have any idea of what could cause Zeppelin Context to be included in the
closure?

Best,

David

On Fri, Aug 21, 2015 at 6:29 PM, moon soo Lee <mo...@apache.org> wrote:

> I have tested your code and can not reproduce the problem.
>
> Could you share your environment? how did you configure Zeppelin with
> Spark?
>
> Thanks,
> moon
>
> On Fri, Aug 21, 2015 at 2:25 AM David Salinas <da...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a problem when using spark closure. This error was not appearing
>> with spark 1.2.1.
>>
>> I have included a reproducible example that happens when taking the
>> closure (Zeppelin has been built with head of master with this command mvn
>> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
>> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
>> encountered this problem? All my previous notebooks are broken by this :(
>>
>> ------------------------------
>> val textFile = sc.textFile("hdfs://somefile.txt")
>>
>> val f = (s: String) => s+s
>> textFile.map(f).count
>> //works fine
>> //res145: Long = 407
>>
>>
>> def f(s:String) = {
>>     s+s
>> }
>> textFile.map(f).count
>>
>> //fails ->
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
>> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
>> Lorg/apache/zeppelin/spark/ZeppelinContext; at
>> java.lang.Class.getDeclaredFields0(Native Method) at
>> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
>> java.lang.Class.getDeclaredField(Class.java:2068) ...
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>>
>> Best,
>>
>> David
>>
>

Re: Closure issue with spark 1.4.1

Posted by moon soo Lee <mo...@apache.org>.

I have tested your code and can not reproduce the problem.

Could you share your environment? how did you configure Zeppelin with Spark?

Thanks,
moon

On Fri, Aug 21, 2015 at 2:25 AM David Salinas <da...@gmail.com>
wrote:

> Hi,
>
> I have a problem when using spark closure. This error was not appearing
> with spark 1.2.1.
>
> I have included a reproducible example that happens when taking the
> closure (Zeppelin has been built with head of master with this command mvn
> install -DskipTests -Pspark-1.4 -Dspark.version=1.4.1
> -Dhadoop.version=2.2.0 -Dprotobuf.version=2.5.0). Does anyone ever
> encountered this problem? All my previous notebooks are broken by this :(
>
> ------------------------------
> val textFile = sc.textFile("hdfs://somefile.txt")
>
> val f = (s: String) => s+s
> textFile.map(f).count
> //works fine
> //res145: Long = 407
>
>
> def f(s:String) = {
>     s+s
> }
> textFile.map(f).count
>
> //fails ->
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 566 in stage 87.0 failed 4 times, most recent failure: Lost task 566.3 in
> stage 87.0 (TID 43396, XXX.com): java.lang.NoClassDefFoundError:
> Lorg/apache/zeppelin/spark/ZeppelinContext; at
> java.lang.Class.getDeclaredFields0(Native Method) at
> java.lang.Class.privateGetDeclaredFields(Class.java:2583) at
> java.lang.Class.getDeclaredField(Class.java:2068) ...
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000) at
>
> Best,
>
> David
>