You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vitaliy Pisarev <vi...@biocatch.com> on 2018/03/11 15:46:15 UTC

Debugging a local spark executor in pycharm

I want to step through the work of a spark executor running locally on my
machine, from Pycharm.

I am running explicit functionality, in the form of
dataset.foreachPartition(f) and I want to see what is going on inside f.

Is there a straightforward way to do it or do I need to resort to remote
debugging?
p.s

Posted this on SO
<https://stackoverflow.com/questions/49221733/debugging-a-local-spark-executor-in-pycharm>
as well.

Re: [EXT] Debugging a local spark executor in pycharm

Posted by Vitaliy Pisarev <vi...@biocatch.com>.
Actually, I stumbled on this SO page
<https://stackoverflow.com/questions/31245083/how-can-pyspark-be-called-in-debug-mode>.
While it is not straightforward, it is a fairly simple solution.

In short:


   - I made sure there is only one executing task at a time by calling
   repartition(1) - this made it easy to locate the one and only spark deamon
   - I set a BP wherever I needed to
   - In order to "catch" the BP, I set a print out and a time.sleep(15)
   right before it. The print out gives me a notice that the daemon is up and
   running
   and the sleep gives me time to push a few buttons so I can attache to
   the procesa

It worked fairly well, and I was able to debug the executor. I did notice
two strange things: sometimes I got a strange error and the debugger didnt
actually attach. It was not deterministic.

Other times I noticed a big gap between the point I got the notification
and attached to the process until the execution was resumed and I could
actually step through (by big gap I mean a gap that is considerably bigger
than the sleep period, usually about 1 minute).

Not perfect but worked most of the time.



On Wed, Mar 14, 2018 at 12:07 AM, Michael Mansour <
Michael_Mansour@symantec.com> wrote:

> Vitaliy,
>
>
>
> From what I understand, this is not possible to do.  However, let me share
> my workaround with you.
>
>
>
> Assuming you have your debugger up and running on PyCharm, set a
> breakpoint at this line, Take|collect|sample  your data (could also
> consider doing a glom if its critical the data remain partitioned, then the
> take/collect), and pass it into the function directly (direct python, no
> spark).  Use the debugger to step through there on that small sample.
>
>
>
> Alternatively, you can open up the PyCharm execution module.  In the
> execution module, do the same as above with the RDD, and pass it into the
> function.  This alleviates the need to write debugging code etc.  I find
> this model useful and a bit more fast, but it does not offer the
> step-through capability.
>
>
>
> Best of luck!
>
> M
>
> --
>
> Michael Mansour
>
> Data Scientist
>
> Symantec CASB
>
> *From: *Vitaliy Pisarev <vi...@biocatch.com>
> *Date: *Sunday, March 11, 2018 at 8:46 AM
> *To: *"user@spark.apache.org" <us...@spark.apache.org>
> *Subject: *[EXT] Debugging a local spark executor in pycharm
>
>
>
> I want to step through the work of a spark executor running locally on my
> machine, from Pycharm.
>
> I am running explicit functionality, in the form of
> dataset.foreachPartition(f) and I want to see what is going on inside f.
>
> Is there a straightforward way to do it or do I need to resort to remote
> debugging?
>
> p.s
>
>
>
> Posted this on SO
> <https://clicktime.symantec.com/a/1/XYlpjXLSKwNlpHDPBadGLxedp5mPjvfMuIlrQmppyAU=?d=8u87emKOH4QJ5KsylIZ3a-sj91IJnMz4MC8WJu6O0ofmn_lSUUdS7RWXMwSMEcMeFkt9iEhnGU-qrxp9tMvOOjLgl2AMzpSBuLdV5zfWaUVfzK25Z9nxNgcy-_1inynQ5O2zLZ19g0IDpi2YaZNd-7HhUUqW_luiZF_Uw4e6SEgMoXlF3gylrRpHpzgnnuZFs_8J7Usq1x4wgD7tiKomSE3y8--cp8QstC7Thv66Z7hwzfY6byPFfPeo5BD-1U7SyeFZj-TP9cYRQO-Gx9UJ-Vra3Eh1Vo-aa9k_99Q7hNgiewvpKkRiJztgJ6WEUbppapzahKbw_rpVQ7CNYlXksEz6eCCrlheFsLXKKqgna1Or1UXg-j-k5qFHNCyNvVklXBB2PSOXved3jhSNiqho4QLYpuNn44aWoCNSXP_RSVmIYENVXyO7y-4saGJ0zrAM2VEX7SWAnWHDICOnzBvpJOuA&u=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F49221733%2Fdebugging-a-local-spark-executor-in-pycharm>
> as well.
>

Re: [EXT] Debugging a local spark executor in pycharm

Posted by Michael Mansour <Mi...@symantec.com>.
Vitaliy,

From what I understand, this is not possible to do.  However, let me share my workaround with you.

Assuming you have your debugger up and running on PyCharm, set a breakpoint at this line, Take|collect|sample  your data (could also consider doing a glom if its critical the data remain partitioned, then the take/collect), and pass it into the function directly (direct python, no spark).  Use the debugger to step through there on that small sample.

Alternatively, you can open up the PyCharm execution module.  In the execution module, do the same as above with the RDD, and pass it into the function.  This alleviates the need to write debugging code etc.  I find this model useful and a bit more fast, but it does not offer the step-through capability.

Best of luck!
M
--
Michael Mansour
Data Scientist
Symantec CASB
From: Vitaliy Pisarev <vi...@biocatch.com>
Date: Sunday, March 11, 2018 at 8:46 AM
To: "user@spark.apache.org" <us...@spark.apache.org>
Subject: [EXT] Debugging a local spark executor in pycharm


I want to step through the work of a spark executor running locally on my machine, from Pycharm.

I am running explicit functionality, in the form of dataset.foreachPartition(f) and I want to see what is going on inside f.

Is there a straightforward way to do it or do I need to resort to remote debugging?
p.s

Posted this on SO<https://clicktime.symantec.com/a/1/XYlpjXLSKwNlpHDPBadGLxedp5mPjvfMuIlrQmppyAU=?d=8u87emKOH4QJ5KsylIZ3a-sj91IJnMz4MC8WJu6O0ofmn_lSUUdS7RWXMwSMEcMeFkt9iEhnGU-qrxp9tMvOOjLgl2AMzpSBuLdV5zfWaUVfzK25Z9nxNgcy-_1inynQ5O2zLZ19g0IDpi2YaZNd-7HhUUqW_luiZF_Uw4e6SEgMoXlF3gylrRpHpzgnnuZFs_8J7Usq1x4wgD7tiKomSE3y8--cp8QstC7Thv66Z7hwzfY6byPFfPeo5BD-1U7SyeFZj-TP9cYRQO-Gx9UJ-Vra3Eh1Vo-aa9k_99Q7hNgiewvpKkRiJztgJ6WEUbppapzahKbw_rpVQ7CNYlXksEz6eCCrlheFsLXKKqgna1Or1UXg-j-k5qFHNCyNvVklXBB2PSOXved3jhSNiqho4QLYpuNn44aWoCNSXP_RSVmIYENVXyO7y-4saGJ0zrAM2VEX7SWAnWHDICOnzBvpJOuA&u=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F49221733%2Fdebugging-a-local-spark-executor-in-pycharm> as well.