You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Jhon Anderson Cardenas Diaz <jh...@gmail.com> on 2018/06/08 15:08:07 UTC

All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Dear community,

Currently we are having problems with multiple users running paragraphs
associated with pyspark jobs.

The problem is that if an user aborts/cancels his pyspark paragraph (job),
the active pyspark jobs of the other users are canceled too.

Going into detail, I've seen that when you cancel a user's job this method
is invoked (which is fine):

sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")

But somehow unknown to me, this method is also invoked:

sc.cancelAllJobs()

The above is due to the trace of the log that appears in the jobs of the
other users:

Py4JJavaError: An error occurred while calling o885.count.
: org.apache.spark.SparkException: Job 461 cancelled as part of
cancellation of all jobs
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at
org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1375)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at
org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:721)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1628)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
at
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org
$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at org.apache.spark.sql.Dataset.org
$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred
while calling o885.count.\n', JavaObject id=o886), <traceback object at
0x7f9e669ae588>)

Any idea of why this could be happening?

(I have 0.8.0 version from September 2017)

Thank you!

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com>.
Hi, we already have spark (and python) configured as per user - scoped mode
and even in that case it does not work. But i will try your second option!.
thank you..

2018-06-12 21:24 GMT-05:00 Jeff Zhang <zj...@gmail.com>:

> This is a limitation of the native PySparkInterpreter.
>
> Two solutions for you.
> 1. Use per user scoped mode so that each user own his own python process
> 2. Use IPySparkInterpreter of zeppelin 0.8 which is better for integration
> python with zeppelin.
>
>
>
> Jhon Anderson Cardenas Diaz <jh...@gmail.com>于2018年6月13日周三
> 上午6:15写道:
>
> > Hi!
> >
> > We found the reason why this error is happening. It seems to be related
> > with the solution
> > <
> > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87
> 4074b070fd
> > >
> > for the task ZEPPELIN-2075
> > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
> >
> > This solution is causing that when one particular user cancels his
> py-spark
> > job, the py-spark jobs from *all the users are being canceled !!*.
> >
> > When a py-spark job is cancelled, the method PySparkInterpreter
> interrupt()
> > is invoked, and then the SIGINT event is called, causing that all the
> jobs
> > in the same spark context be cancelled:
> >
> > context.py:
> >
> > # create a signal handler which would be invoked on receiving SIGINT
> > def signal_handler(signal, frame):
> >     *self.cancelAllJobs()*
> >     raise KeyboardInterrupt()
> >
> > Is this a zeppelin bug ?
> >
> > Thank you.
> >
> >
> > 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > jhonderson2007@gmail.com>:
> >
> > > Hi!
> > >
> > > We found the reason why this error is happening. It seems to be related
> > > with the solution
> > > <
> > https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a87
> 4074b070fd
> > >
> > > for the task ZEPPELIN-2075
> > > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
> > >
> > > This solution is causing that when one particular user cancels his
> > > py-spark job, the py-spark jobs from all the users are being canceled.
> > >
> > > When a py-spark job is cancelled, the method PySparkInterpreter
> > > interrupt() is invoked, and then the SIGINT
> > >
> > > context.py:
> > >
> > > # create a signal handler which would be invoked on receiving SIGINT
> > > def signal_handler(signal, frame):
> > >     self.cancelAllJobs()
> > >     raise KeyboardInterrupt()
> > >
> > >
> > > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > > jhonderson2007@gmail.com>:
> > >
> > >> Hi!.
> > >> I have 0.8.0 version, from September  2017
> > >>
> > >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <
> jzhang@hortonworks.com
> > >:
> > >>
> > >>>
> > >>> Which version do you use ?
> > >>>
> > >>>
> > >>> Best Regard,
> > >>> Jeff Zhang
> > >>>
> > >>>
> > >>> From: Jhon Anderson Cardenas Diaz <jhonderson2007@gmail.com<mailto:
> > >>> jhonderson2007@gmail.com>>
> > >>> Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.
> org
> > >"
> > >>> <us...@zeppelin.apache.org>>
> > >>> Date: Friday, June 8, 2018 at 11:08 PM
> > >>> To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> > >>> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>, "
> > >>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> > >>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>>
> > >>> Subject: All PySpark jobs are canceled when one user cancel his
> PySpark
> > >>> paragraph (job)
> > >>>
> > >>> Dear community,
> > >>>
> > >>> Currently we are having problems with multiple users running
> paragraphs
> > >>> associated with pyspark jobs.
> > >>>
> > >>> The problem is that if an user aborts/cancels his pyspark paragraph
> > >>> (job), the active pyspark jobs of the other users are canceled too.
> > >>>
> > >>> Going into detail, I've seen that when you cancel a user's job this
> > >>> method is invoked (which is fine):
> > >>>
> > >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
> > >>>
> > >>> But somehow unknown to me, this method is also invoked:
> > >>>
> > >>> sc.cancelAllJobs()
> > >>>
> > >>> The above is due to the trace of the log that appears in the jobs of
> > the
> > >>> other users:
> > >>>
> > >>> Py4JJavaError: An error occurred while calling o885.count.
> > >>> : org.apache.spark.SparkException: Job 461 cancelled as part of
> > >>> cancellation of all jobs
> > >>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach
> > >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
> > >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> > >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
> > >>> n(DAGScheduler.scala:1375)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply(DAGScheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> > >>> Jobs$1.apply(DAGScheduler.scala:721)
> > >>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> > >>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
> > >>> cheduler.scala:721)
> > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
> > >>> Receive(DAGScheduler.scala:1628)
> > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> > >>> ceive(DAGScheduler.scala:1605)
> > >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> > >>> ceive(DAGScheduler.scala:1594)
> > >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> > >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
> > >>> scala:628)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
> > >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
> > >>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
> > >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> > >>> onScope.scala:151)
> > >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> > >>> onScope.scala:112)
> > >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> > >>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
> > >>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(Spar
> > >>> kPlan.scala:275)
> > >>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
> > >>> ataset$$execute$1$1.apply(Dataset.scala:2386)
> > >>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
> > >>> nId(SQLExecution.scala:57)
> > >>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.
> scala:2788)
> > >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
> > >>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(
> Dataset.scala:2385)
> > >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
> > >>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
> > >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
> > >>> scala:2420)
> > >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
> > >>> scala:2419)
> > >>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
> > >>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
> > >>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
> > >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
> > >>> thodAccessorImpl.java:43)
> > >>> at java.lang.reflect.Method.invoke(Method.java:498)
> > >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> > >>> at py4j.reflection.ReflectionEngine.invoke(
> ReflectionEngine.java:357)
> > >>> at py4j.Gateway.invoke(Gateway.java:280)
> > >>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.
> java:132)
> > >>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> > >>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> > >>> at java.lang.Thread.run(Thread.java:748)
> > >>>
> > >>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error
> > >>> occurred while calling o885.count.\n', JavaObject id=o886),
> <traceback
> > >>> object at 0x7f9e669ae588>)
> > >>>
> > >>> Any idea of why this could be happening?
> > >>>
> > >>> (I have 0.8.0 version from September 2017)
> > >>>
> > >>> Thank you!
> > >>>
> > >>
> > >>
> > >
> >
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by Jeff Zhang <zj...@gmail.com>.
This is a limitation of the native PySparkInterpreter.

Two solutions for you.
1. Use per user scoped mode so that each user own his own python process
2. Use IPySparkInterpreter of zeppelin 0.8 which is better for integration
python with zeppelin.



Jhon Anderson Cardenas Diaz <jh...@gmail.com>于2018年6月13日周三
上午6:15写道:

> Hi!
>
> We found the reason why this error is happening. It seems to be related
> with the solution
> <
> https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd
> >
> for the task ZEPPELIN-2075
> <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
>
> This solution is causing that when one particular user cancels his py-spark
> job, the py-spark jobs from *all the users are being canceled !!*.
>
> When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
> is invoked, and then the SIGINT event is called, causing that all the jobs
> in the same spark context be cancelled:
>
> context.py:
>
> # create a signal handler which would be invoked on receiving SIGINT
> def signal_handler(signal, frame):
>     *self.cancelAllJobs()*
>     raise KeyboardInterrupt()
>
> Is this a zeppelin bug ?
>
> Thank you.
>
>
> 2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
> jhonderson2007@gmail.com>:
>
> > Hi!
> >
> > We found the reason why this error is happening. It seems to be related
> > with the solution
> > <
> https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd
> >
> > for the task ZEPPELIN-2075
> > <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
> >
> > This solution is causing that when one particular user cancels his
> > py-spark job, the py-spark jobs from all the users are being canceled.
> >
> > When a py-spark job is cancelled, the method PySparkInterpreter
> > interrupt() is invoked, and then the SIGINT
> >
> > context.py:
> >
> > # create a signal handler which would be invoked on receiving SIGINT
> > def signal_handler(signal, frame):
> >     self.cancelAllJobs()
> >     raise KeyboardInterrupt()
> >
> >
> > 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> > jhonderson2007@gmail.com>:
> >
> >> Hi!.
> >> I have 0.8.0 version, from September  2017
> >>
> >> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <jzhang@hortonworks.com
> >:
> >>
> >>>
> >>> Which version do you use ?
> >>>
> >>>
> >>> Best Regard,
> >>> Jeff Zhang
> >>>
> >>>
> >>> From: Jhon Anderson Cardenas Diaz <jhonderson2007@gmail.com<mailto:
> >>> jhonderson2007@gmail.com>>
> >>> Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org
> >"
> >>> <us...@zeppelin.apache.org>>
> >>> Date: Friday, June 8, 2018 at 11:08 PM
> >>> To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> >>> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>, "
> >>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> >>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>>
> >>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
> >>> paragraph (job)
> >>>
> >>> Dear community,
> >>>
> >>> Currently we are having problems with multiple users running paragraphs
> >>> associated with pyspark jobs.
> >>>
> >>> The problem is that if an user aborts/cancels his pyspark paragraph
> >>> (job), the active pyspark jobs of the other users are canceled too.
> >>>
> >>> Going into detail, I've seen that when you cancel a user's job this
> >>> method is invoked (which is fine):
> >>>
> >>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
> >>>
> >>> But somehow unknown to me, this method is also invoked:
> >>>
> >>> sc.cancelAllJobs()
> >>>
> >>> The above is due to the trace of the log that appears in the jobs of
> the
> >>> other users:
> >>>
> >>> Py4JJavaError: An error occurred while calling o885.count.
> >>> : org.apache.spark.SparkException: Job 461 cancelled as part of
> >>> cancellation of all jobs
> >>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach
> >>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
> >>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
> >>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
> >>> n(DAGScheduler.scala:1375)
> >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> >>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> >>> Jobs$1.apply(DAGScheduler.scala:721)
> >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
> >>> Jobs$1.apply(DAGScheduler.scala:721)
> >>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> >>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
> >>> cheduler.scala:721)
> >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
> >>> Receive(DAGScheduler.scala:1628)
> >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> >>> ceive(DAGScheduler.scala:1605)
> >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> >>> ceive(DAGScheduler.scala:1594)
> >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> >>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
> >>> scala:628)
> >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
> >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
> >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
> >>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
> >>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
> >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> >>> onScope.scala:151)
> >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> >>> onScope.scala:112)
> >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> >>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
> >>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(Spar
> >>> kPlan.scala:275)
> >>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
> >>> ataset$$execute$1$1.apply(Dataset.scala:2386)
> >>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
> >>> nId(SQLExecution.scala:57)
> >>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
> >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
> >>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
> >>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
> >>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
> >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
> >>> scala:2420)
> >>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
> >>> scala:2419)
> >>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
> >>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
> >>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
> >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
> >>> thodAccessorImpl.java:43)
> >>> at java.lang.reflect.Method.invoke(Method.java:498)
> >>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> >>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> >>> at py4j.Gateway.invoke(Gateway.java:280)
> >>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> >>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> >>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> >>> at java.lang.Thread.run(Thread.java:748)
> >>>
> >>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error
> >>> occurred while calling o885.count.\n', JavaObject id=o886), <traceback
> >>> object at 0x7f9e669ae588>)
> >>>
> >>> Any idea of why this could be happening?
> >>>
> >>> (I have 0.8.0 version from September 2017)
> >>>
> >>> Thank you!
> >>>
> >>
> >>
> >
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com>.
Hi!

We found the reason why this error is happening. It seems to be related
with the solution
<https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd>
for the task ZEPPELIN-2075
<https://issues.apache.org/jira/browse/ZEPPELIN-2075>.

This solution is causing that when one particular user cancels his py-spark
job, the py-spark jobs from *all the users are being canceled !!*.

When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
is invoked, and then the SIGINT event is called, causing that all the jobs
in the same spark context be cancelled:

context.py:

# create a signal handler which would be invoked on receiving SIGINT
def signal_handler(signal, frame):
    *self.cancelAllJobs()*
    raise KeyboardInterrupt()

Is this a zeppelin bug ?

Thank you.


2018-06-12 17:12 GMT-05:00 Jhon Anderson Cardenas Diaz <
jhonderson2007@gmail.com>:

> Hi!
>
> We found the reason why this error is happening. It seems to be related
> with the solution
> <https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd>
> for the task ZEPPELIN-2075
> <https://issues.apache.org/jira/browse/ZEPPELIN-2075>.
>
> This solution is causing that when one particular user cancels his
> py-spark job, the py-spark jobs from all the users are being canceled.
>
> When a py-spark job is cancelled, the method PySparkInterpreter
> interrupt() is invoked, and then the SIGINT
>
> context.py:
>
> # create a signal handler which would be invoked on receiving SIGINT
> def signal_handler(signal, frame):
>     self.cancelAllJobs()
>     raise KeyboardInterrupt()
>
>
> 2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
> jhonderson2007@gmail.com>:
>
>> Hi!.
>> I have 0.8.0 version, from September  2017
>>
>> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <jz...@hortonworks.com>:
>>
>>>
>>> Which version do you use ?
>>>
>>>
>>> Best Regard,
>>> Jeff Zhang
>>>
>>>
>>> From: Jhon Anderson Cardenas Diaz <jhonderson2007@gmail.com<mailto:
>>> jhonderson2007@gmail.com>>
>>> Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>"
>>> <us...@zeppelin.apache.org>>
>>> Date: Friday, June 8, 2018 at 11:08 PM
>>> To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
>>> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>, "
>>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
>>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>>
>>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
>>> paragraph (job)
>>>
>>> Dear community,
>>>
>>> Currently we are having problems with multiple users running paragraphs
>>> associated with pyspark jobs.
>>>
>>> The problem is that if an user aborts/cancels his pyspark paragraph
>>> (job), the active pyspark jobs of the other users are canceled too.
>>>
>>> Going into detail, I've seen that when you cancel a user's job this
>>> method is invoked (which is fine):
>>>
>>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>>>
>>> But somehow unknown to me, this method is also invoked:
>>>
>>> sc.cancelAllJobs()
>>>
>>> The above is due to the trace of the log that appears in the jobs of the
>>> other users:
>>>
>>> Py4JJavaError: An error occurred while calling o885.count.
>>> : org.apache.spark.SparkException: Job 461 cancelled as part of
>>> cancellation of all jobs
>>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach
>>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$D
>>> AGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
>>> n(DAGScheduler.scala:1375)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>>> Jobs$1.apply(DAGScheduler.scala:721)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>>> Jobs$1.apply(DAGScheduler.scala:721)
>>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
>>> cheduler.scala:721)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>>> Receive(DAGScheduler.scala:1628)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>> ceive(DAGScheduler.scala:1605)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>>> ceive(DAGScheduler.scala:1594)
>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
>>> scala:628)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
>>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>> onScope.scala:151)
>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>> onScope.scala:112)
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
>>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(Spar
>>> kPlan.scala:275)
>>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
>>> ataset$$execute$1$1.apply(Dataset.scala:2386)
>>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>>> nId(SQLExecution.scala:57)
>>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
>>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
>>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
>>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
>>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
>>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>>> scala:2420)
>>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>>> scala:2419)
>>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
>>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
>>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>>> at py4j.Gateway.invoke(Gateway.java:280)
>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error
>>> occurred while calling o885.count.\n', JavaObject id=o886), <traceback
>>> object at 0x7f9e669ae588>)
>>>
>>> Any idea of why this could be happening?
>>>
>>> (I have 0.8.0 version from September 2017)
>>>
>>> Thank you!
>>>
>>
>>
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com>.
Hi!

We found the reason why this error is happening. It seems to be related
with the solution
<https://github.com/apache/zeppelin/commit/9f22db91c279b7daf6a13b2d805a874074b070fd>
for the task ZEPPELIN-2075
<https://issues.apache.org/jira/browse/ZEPPELIN-2075>.

This solution is causing that when one particular user cancels his py-spark
job, the py-spark jobs from all the users are being canceled.

When a py-spark job is cancelled, the method PySparkInterpreter interrupt()
is invoked, and then the SIGINT

context.py:

# create a signal handler which would be invoked on receiving SIGINT
def signal_handler(signal, frame):
    self.cancelAllJobs()
    raise KeyboardInterrupt()


2018-06-12 9:26 GMT-05:00 Jhon Anderson Cardenas Diaz <
jhonderson2007@gmail.com>:

> Hi!.
> I have 0.8.0 version, from September  2017
>
> 2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <jz...@hortonworks.com>:
>
>>
>> Which version do you use ?
>>
>>
>> Best Regard,
>> Jeff Zhang
>>
>>
>> From: Jhon Anderson Cardenas Diaz <jhonderson2007@gmail.com<mailto:
>> jhonderson2007@gmail.com>>
>> Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
>> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>
>> Date: Friday, June 8, 2018 at 11:08 PM
>> To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
>> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>, "
>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
>> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>>
>> Subject: All PySpark jobs are canceled when one user cancel his PySpark
>> paragraph (job)
>>
>> Dear community,
>>
>> Currently we are having problems with multiple users running paragraphs
>> associated with pyspark jobs.
>>
>> The problem is that if an user aborts/cancels his pyspark paragraph
>> (job), the active pyspark jobs of the other users are canceled too.
>>
>> Going into detail, I've seen that when you cancel a user's job this
>> method is invoked (which is fine):
>>
>> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>>
>> But somehow unknown to me, this method is also invoked:
>>
>> sc.cancelAllJobs()
>>
>> The above is due to the trace of the log that appears in the jobs of the
>> other users:
>>
>> Py4JJavaError: An error occurred while calling o885.count.
>> : org.apache.spark.SparkException: Job 461 cancelled as part of
>> cancellation of all jobs
>> at org.apache.spark.scheduler.DAGScheduler.org<http://org.apach
>> e.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$
>> DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
>> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellatio
>> n(DAGScheduler.scala:1375)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply(DAGScheduler.scala:721)
>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAll
>> Jobs$1.apply(DAGScheduler.scala:721)
>> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGS
>> cheduler.scala:721)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
>> Receive(DAGScheduler.scala:1628)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1605)
>> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
>> ceive(DAGScheduler.scala:1594)
>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
>> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:151)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:112)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
>> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
>> at org.apache.spark.sql.execution.SparkPlan.executeCollect(
>> SparkPlan.scala:275)
>> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$D
>> ataset$$execute$1$1.apply(Dataset.scala:2386)
>> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutio
>> nId(SQLExecution.scala:57)
>> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
>> .Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
>> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql
>> .Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>> scala:2420)
>> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.
>> scala:2419)
>> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
>> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
>> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>> at py4j.Gateway.invoke(Gateway.java:280)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:214)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred
>> while calling o885.count.\n', JavaObject id=o886), <traceback object at
>> 0x7f9e669ae588>)
>>
>> Any idea of why this could be happening?
>>
>> (I have 0.8.0 version from September 2017)
>>
>> Thank you!
>>
>
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by Jhon Anderson Cardenas Diaz <jh...@gmail.com>.
Hi!.
I have 0.8.0 version, from September  2017

2018-06-12 4:48 GMT-05:00 Jianfeng (Jeff) Zhang <jz...@hortonworks.com>:

>
> Which version do you use ?
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: Jhon Anderson Cardenas Diaz <jhonderson2007@gmail.com<mailto:
> jhonderson2007@gmail.com>>
> Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>
> Date: Friday, June 8, 2018 at 11:08 PM
> To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> users@zeppelin.apache.org<ma...@zeppelin.apache.org>>, "
> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <
> dev@zeppelin.apache.org<ma...@zeppelin.apache.org>>
> Subject: All PySpark jobs are canceled when one user cancel his PySpark
> paragraph (job)
>
> Dear community,
>
> Currently we are having problems with multiple users running paragraphs
> associated with pyspark jobs.
>
> The problem is that if an user aborts/cancels his pyspark paragraph (job),
> the active pyspark jobs of the other users are canceled too.
>
> Going into detail, I've seen that when you cancel a user's job this method
> is invoked (which is fine):
>
> sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")
>
> But somehow unknown to me, this method is also invoked:
>
> sc.cancelAllJobs()
>
> The above is due to the trace of the log that appears in the jobs of the
> other users:
>
> Py4JJavaError: An error occurred while calling o885.count.
> : org.apache.spark.SparkException: Job 461 cancelled as part of
> cancellation of all jobs
> at org.apache.spark.scheduler.DAGScheduler.org<http://org.
> apache.spark.scheduler.DAGScheduler.org>$apache$
> spark$scheduler$DAGScheduler$$failJobAndIndependentStages(
> DAGScheduler.scala:1435)
> at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(
> DAGScheduler.scala:1375)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> doCancelAllJobs$1.apply(DAGScheduler.scala:721)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
> doCancelAllJobs$1.apply(DAGScheduler.scala:721)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(
> DAGScheduler.scala:721)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> doOnReceive(DAGScheduler.scala:1628)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1605)
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
> onReceive(DAGScheduler.scala:1594)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
> at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.
> scala:275)
> at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$
> Dataset$$execute$1$1.apply(Dataset.scala:2386)
> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(
> SQLExecution.scala:57)
> at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.
> sql.Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
> at org.apache.spark.sql.Dataset.org<http://org.apache.spark.
> sql.Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
> at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
> at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
> at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:214)
> at java.lang.Thread.run(Thread.java:748)
>
> (<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred
> while calling o885.count.\n', JavaObject id=o886), <traceback object at
> 0x7f9e669ae588>)
>
> Any idea of why this could be happening?
>
> (I have 0.8.0 version from September 2017)
>
> Thank you!
>

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Which version do you use ?


Best Regard,
Jeff Zhang


From: Jhon Anderson Cardenas Diaz <jh...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Friday, June 8, 2018 at 11:08 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>, "dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <de...@zeppelin.apache.org>>
Subject: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Dear community,

Currently we are having problems with multiple users running paragraphs associated with pyspark jobs.

The problem is that if an user aborts/cancels his pyspark paragraph (job), the active pyspark jobs of the other users are canceled too.

Going into detail, I've seen that when you cancel a user's job this method is invoked (which is fine):

sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")

But somehow unknown to me, this method is also invoked:

sc.cancelAllJobs()

The above is due to the trace of the log that appears in the jobs of the other users:

Py4JJavaError: An error occurred while calling o885.count.
: org.apache.spark.SparkException: Job 461 cancelled as part of cancellation of all jobs
at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1375)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:721)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1628)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o885.count.\n', JavaObject id=o886), <traceback object at 0x7f9e669ae588>)

Any idea of why this could be happening?

(I have 0.8.0 version from September 2017)

Thank you!

Re: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.
Which version do you use ?


Best Regard,
Jeff Zhang


From: Jhon Anderson Cardenas Diaz <jh...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Friday, June 8, 2018 at 11:08 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>, "dev@zeppelin.apache.org<ma...@zeppelin.apache.org>" <de...@zeppelin.apache.org>>
Subject: All PySpark jobs are canceled when one user cancel his PySpark paragraph (job)

Dear community,

Currently we are having problems with multiple users running paragraphs associated with pyspark jobs.

The problem is that if an user aborts/cancels his pyspark paragraph (job), the active pyspark jobs of the other users are canceled too.

Going into detail, I've seen that when you cancel a user's job this method is invoked (which is fine):

sc.cancelJobGroup("zeppelin-[notebook-id]-[paragraph-id]")

But somehow unknown to me, this method is also invoked:

sc.cancelAllJobs()

The above is due to the trace of the log that appears in the jobs of the other users:

Py4JJavaError: An error occurred while calling o885.count.
: org.apache.spark.SparkException: Job 461 cancelled as part of cancellation of all jobs
at org.apache.spark.scheduler.DAGScheduler.org<http://org.apache.spark.scheduler.DAGScheduler.org>$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1375)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:721)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:721)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:721)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1628)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1925)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1938)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1965)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275)
at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2386)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2788)
at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2385)
at org.apache.spark.sql.Dataset.org<http://org.apache.spark.sql.Dataset.org>$apache$spark$sql$Dataset$$collect(Dataset.scala:2392)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2420)
at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2419)
at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2801)
at org.apache.spark.sql.Dataset.count(Dataset.scala:2419)
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)

(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o885.count.\n', JavaObject id=o886), <traceback object at 0x7f9e669ae588>)

Any idea of why this could be happening?

(I have 0.8.0 version from September 2017)

Thank you!