You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Marcelo Oikawa <ma...@webradar.com> on 2016/05/31 20:18:39 UTC

Debug spark jobs on Intellij

Hello, list.

I'm trying to debug my spark application on Intellij IDE. Before I submit
my job, I ran the command line:

export
SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000

after that:

bin/spark-submit app-jar-with-dependencies.jar <arguments>

The IDE connects with the running job but all code that is running on
worker machine is unreachable to debug. See below:

rdd.foreachPartition(partition -> { //breakpoint stop here

    partition.forEachRemaining(message -> {

        //breakpoint doenst stop here

     })
});

Does anyone know if is is possible? How? Any ideas?

Re: Debug spark jobs on Intellij

Posted by Marcelo Oikawa <ma...@webradar.com>.
> Is this python right? I'm not used to it, I'm used to scala, so
>

No. It is Java.


> val toDebug = rdd.foreachPartition(partition -> { //breakpoint stop here
> *// by val toDebug I mean to assign the result of foreachPartition to a
> variable*
>     partition.forEachRemaining(message -> {
>         //breakpoint doenst stop here
>
>      })
> });
>
> *toDebug.first* // now is when this method will run
>

foreachPartition is a void method.


>
>
> 2016-05-31 17:59 GMT-03:00 Marcelo Oikawa <ma...@webradar.com>:
>
>>
>>
>>> Hi Marcelo, this is because the operations in rdd are lazy, you will
>>> only stop at this inside foreach breakpoint when you call a first, a
>>> collect or a reduce operation.
>>>
>>
>> Does forEachRemaining isn't a final method as first, collect or reduce?
>> Anyway, I guess this is not the problem itself because the code inside
>> forEachRemaining runs well but I can't debug this block.
>>
>>
>>> This is when the spark will run the operations.
>>> Have you tried that?
>>>
>>> Cheers.
>>>
>>> 2016-05-31 17:18 GMT-03:00 Marcelo Oikawa <ma...@webradar.com>:
>>>
>>>> Hello, list.
>>>>
>>>> I'm trying to debug my spark application on Intellij IDE. Before I
>>>> submit my job, I ran the command line:
>>>>
>>>> export
>>>> SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000
>>>>
>>>> after that:
>>>>
>>>> bin/spark-submit app-jar-with-dependencies.jar <arguments>
>>>>
>>>> The IDE connects with the running job but all code that is running on
>>>> worker machine is unreachable to debug. See below:
>>>>
>>>> rdd.foreachPartition(partition -> { //breakpoint stop here
>>>>
>>>>     partition.forEachRemaining(message -> {
>>>>
>>>>         //breakpoint doenst stop here
>>>>
>>>>      })
>>>> });
>>>>
>>>> Does anyone know if is is possible? How? Any ideas?
>>>>
>>>>
>>>>
>>>
>>
>

Re: Debug spark jobs on Intellij

Posted by Dirceu Semighini Filho <di...@gmail.com>.
Try this:
Is this python right? I'm not used to it, I'm used to scala, so

val toDebug = rdd.foreachPartition(partition -> { //breakpoint stop here
*// by val toDebug I mean to assign the result of foreachPartition to a
variable*
    partition.forEachRemaining(message -> {
        //breakpoint doenst stop here

     })
});

*toDebug.first* // now is when this method will run


2016-05-31 17:59 GMT-03:00 Marcelo Oikawa <ma...@webradar.com>:

>
>
>> Hi Marcelo, this is because the operations in rdd are lazy, you will only
>> stop at this inside foreach breakpoint when you call a first, a collect or
>> a reduce operation.
>>
>
> Does forEachRemaining isn't a final method as first, collect or reduce?
> Anyway, I guess this is not the problem itself because the code inside
> forEachRemaining runs well but I can't debug this block.
>
>
>> This is when the spark will run the operations.
>> Have you tried that?
>>
>> Cheers.
>>
>> 2016-05-31 17:18 GMT-03:00 Marcelo Oikawa <ma...@webradar.com>:
>>
>>> Hello, list.
>>>
>>> I'm trying to debug my spark application on Intellij IDE. Before I
>>> submit my job, I ran the command line:
>>>
>>> export
>>> SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000
>>>
>>> after that:
>>>
>>> bin/spark-submit app-jar-with-dependencies.jar <arguments>
>>>
>>> The IDE connects with the running job but all code that is running on
>>> worker machine is unreachable to debug. See below:
>>>
>>> rdd.foreachPartition(partition -> { //breakpoint stop here
>>>
>>>     partition.forEachRemaining(message -> {
>>>
>>>         //breakpoint doenst stop here
>>>
>>>      })
>>> });
>>>
>>> Does anyone know if is is possible? How? Any ideas?
>>>
>>>
>>>
>>
>

Re: Debug spark jobs on Intellij

Posted by Marcelo Oikawa <ma...@webradar.com>.
> Hi Marcelo, this is because the operations in rdd are lazy, you will only
> stop at this inside foreach breakpoint when you call a first, a collect or
> a reduce operation.
>

Does forEachRemaining isn't a final method as first, collect or reduce?
Anyway, I guess this is not the problem itself because the code inside
forEachRemaining runs well but I can't debug this block.


> This is when the spark will run the operations.
> Have you tried that?
>
> Cheers.
>
> 2016-05-31 17:18 GMT-03:00 Marcelo Oikawa <ma...@webradar.com>:
>
>> Hello, list.
>>
>> I'm trying to debug my spark application on Intellij IDE. Before I submit
>> my job, I ran the command line:
>>
>> export
>> SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000
>>
>> after that:
>>
>> bin/spark-submit app-jar-with-dependencies.jar <arguments>
>>
>> The IDE connects with the running job but all code that is running on
>> worker machine is unreachable to debug. See below:
>>
>> rdd.foreachPartition(partition -> { //breakpoint stop here
>>
>>     partition.forEachRemaining(message -> {
>>
>>         //breakpoint doenst stop here
>>
>>      })
>> });
>>
>> Does anyone know if is is possible? How? Any ideas?
>>
>>
>>
>

Re: Debug spark jobs on Intellij

Posted by Dirceu Semighini Filho <di...@gmail.com>.
Hi Marcelo, this is because the operations in rdd are lazy, you will only
stop at this inside foreach breakpoint when you call a first, a collect or
a reduce operation.
This is when the spark will run the operations.
Have you tried that?

Cheers.

2016-05-31 17:18 GMT-03:00 Marcelo Oikawa <ma...@webradar.com>:

> Hello, list.
>
> I'm trying to debug my spark application on Intellij IDE. Before I submit
> my job, I ran the command line:
>
> export
> SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=4000
>
> after that:
>
> bin/spark-submit app-jar-with-dependencies.jar <arguments>
>
> The IDE connects with the running job but all code that is running on
> worker machine is unreachable to debug. See below:
>
> rdd.foreachPartition(partition -> { //breakpoint stop here
>
>     partition.forEachRemaining(message -> {
>
>         //breakpoint doenst stop here
>
>      })
> });
>
> Does anyone know if is is possible? How? Any ideas?
>
>
>