You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "Yiming (John) Zhang" <sd...@gmail.com> on 2014/11/19 05:00:18 UTC

Intro to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt (for beginners)

Hi,

 

I noticed it is hard to find a thorough introduction to using IntelliJ to
debug SPARK-1.1 Apps with mvn/sbt, which is not straightforward for
beginners. So I spent several days to figure it out and hope that it would
be helpful for beginners like me and that professionals can help me improve
it. (The intro with figures can be found at:
http://kylinx.com/spark/Debug-Spark-in-IntelliJ.htm)

 

(1) Install the Scala plugin

 

(2) Download, unzip and open spark-1.1.0 in IntelliJ 

a) mvn: File -> Open. 

    Select the Spark source folder (e.g., /root/spark-1.1.0). Maybe it will
take a long time to download and compile a lot of things

b) sbt: File -> Import Project. 

    Select "Import project from external model", then choose SBT project,
click Next. Input the Spark source path (e.g., /root/spark-1.1.0) for "SBT
project", and select Use auto-import.

 

(3) First compile and run spark examples in the console to ensure everything
OK

# mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package

# ./sbt/sbt assembly -Phadoop-2.2 -Dhadoop.version=2.2.0

 

(4) Add the compiled spark-hadoop library (spark-assembly-1.1.0-hadoop2.2.0)
to "Libraries" (File -> Project Structure. -> Libraries -> green +). And
choose modules that use it (right-click the library and click "Add to
Modules"). It seems only spark-examples need it.

 

(5) In the "Dependencies" page of the modules using this library, ensure
that the "Scope" of this library is "Compile" (File -> Project Structure. ->
Modules)

(6) For sbt, it seems that we have to label the scope of all other hadoop
dependencies (SBT: org.apache.hadoop.hadoop-*) as "Test" (due to poor
Internet connection?) And this has to be done every time opening IntelliJ
(due to a bug?)

 

(7) Configure debug environment (using LogQuery as an example). Run -> Edit
Configurations.

Main class: org.apache.spark.examples.LogQuery

VM options: -Dspark.master=local

Working directory: /root/spark-1.1.0

Use classpath of module: spark-examples_2.10

Before launch: External tool: mvn

    Program: /root/Programs/apache-maven-3.2.1/bin/mvn

    Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests package

    Working directory: /root/spark-1.1.0

Before launch: External tool: sbt

    Program: /root/spark-1.1.0/sbt/sbt

    Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 assembly 

    Working directory: /root/spark-1.1.0

 

(8) Click Run -> Debug 'LogQuery' to start debugging

 

 

Cheers,

Yiming


Re: Intro to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt (for beginners)

Posted by "Chester @work" <ch...@alpinenow.com>.
For sbt
You can simplify run
sbt/sbt gen-idea 

To generate the IntelliJ idea project module for you. You can the just open the generated project, which includes all the needed dependencies 

Sent from my iPhone

> On Nov 18, 2014, at 8:26 PM, Chen He <ai...@gmail.com> wrote:
> 
> Thank you Yiming. It is helpful.
> 
> Regards!
> 
> Chen
> 
> On Tue, Nov 18, 2014 at 8:00 PM, Yiming (John) Zhang <sd...@gmail.com>
> wrote:
> 
>> Hi,
>> 
>> 
>> 
>> I noticed it is hard to find a thorough introduction to using IntelliJ to
>> debug SPARK-1.1 Apps with mvn/sbt, which is not straightforward for
>> beginners. So I spent several days to figure it out and hope that it would
>> be helpful for beginners like me and that professionals can help me improve
>> it. (The intro with figures can be found at:
>> http://kylinx.com/spark/Debug-Spark-in-IntelliJ.htm)
>> 
>> 
>> 
>> (1) Install the Scala plugin
>> 
>> 
>> 
>> (2) Download, unzip and open spark-1.1.0 in IntelliJ
>> 
>> a) mvn: File -> Open.
>> 
>>    Select the Spark source folder (e.g., /root/spark-1.1.0). Maybe it will
>> take a long time to download and compile a lot of things
>> 
>> b) sbt: File -> Import Project.
>> 
>>    Select "Import project from external model", then choose SBT project,
>> click Next. Input the Spark source path (e.g., /root/spark-1.1.0) for "SBT
>> project", and select Use auto-import.
>> 
>> 
>> 
>> (3) First compile and run spark examples in the console to ensure
>> everything
>> OK
>> 
>> # mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
>> 
>> # ./sbt/sbt assembly -Phadoop-2.2 -Dhadoop.version=2.2.0
>> 
>> 
>> 
>> (4) Add the compiled spark-hadoop library
>> (spark-assembly-1.1.0-hadoop2.2.0)
>> to "Libraries" (File -> Project Structure. -> Libraries -> green +). And
>> choose modules that use it (right-click the library and click "Add to
>> Modules"). It seems only spark-examples need it.
>> 
>> 
>> 
>> (5) In the "Dependencies" page of the modules using this library, ensure
>> that the "Scope" of this library is "Compile" (File -> Project Structure.
>> ->
>> Modules)
>> 
>> (6) For sbt, it seems that we have to label the scope of all other hadoop
>> dependencies (SBT: org.apache.hadoop.hadoop-*) as "Test" (due to poor
>> Internet connection?) And this has to be done every time opening IntelliJ
>> (due to a bug?)
>> 
>> 
>> 
>> (7) Configure debug environment (using LogQuery as an example). Run -> Edit
>> Configurations.
>> 
>> Main class: org.apache.spark.examples.LogQuery
>> 
>> VM options: -Dspark.master=local
>> 
>> Working directory: /root/spark-1.1.0
>> 
>> Use classpath of module: spark-examples_2.10
>> 
>> Before launch: External tool: mvn
>> 
>>    Program: /root/Programs/apache-maven-3.2.1/bin/mvn
>> 
>>    Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests package
>> 
>>    Working directory: /root/spark-1.1.0
>> 
>> Before launch: External tool: sbt
>> 
>>    Program: /root/spark-1.1.0/sbt/sbt
>> 
>>    Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 assembly
>> 
>>    Working directory: /root/spark-1.1.0
>> 
>> 
>> 
>> (8) Click Run -> Debug 'LogQuery' to start debugging
>> 
>> 
>> 
>> 
>> 
>> Cheers,
>> 
>> Yiming
>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Intro to using IntelliJ to debug SPARK-1.1 Apps with mvn/sbt (for beginners)

Posted by Chen He <ai...@gmail.com>.
Thank you Yiming. It is helpful.

Regards!

Chen

On Tue, Nov 18, 2014 at 8:00 PM, Yiming (John) Zhang <sd...@gmail.com>
wrote:

> Hi,
>
>
>
> I noticed it is hard to find a thorough introduction to using IntelliJ to
> debug SPARK-1.1 Apps with mvn/sbt, which is not straightforward for
> beginners. So I spent several days to figure it out and hope that it would
> be helpful for beginners like me and that professionals can help me improve
> it. (The intro with figures can be found at:
> http://kylinx.com/spark/Debug-Spark-in-IntelliJ.htm)
>
>
>
> (1) Install the Scala plugin
>
>
>
> (2) Download, unzip and open spark-1.1.0 in IntelliJ
>
> a) mvn: File -> Open.
>
>     Select the Spark source folder (e.g., /root/spark-1.1.0). Maybe it will
> take a long time to download and compile a lot of things
>
> b) sbt: File -> Import Project.
>
>     Select "Import project from external model", then choose SBT project,
> click Next. Input the Spark source path (e.g., /root/spark-1.1.0) for "SBT
> project", and select Use auto-import.
>
>
>
> (3) First compile and run spark examples in the console to ensure
> everything
> OK
>
> # mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
>
> # ./sbt/sbt assembly -Phadoop-2.2 -Dhadoop.version=2.2.0
>
>
>
> (4) Add the compiled spark-hadoop library
> (spark-assembly-1.1.0-hadoop2.2.0)
> to "Libraries" (File -> Project Structure. -> Libraries -> green +). And
> choose modules that use it (right-click the library and click "Add to
> Modules"). It seems only spark-examples need it.
>
>
>
> (5) In the "Dependencies" page of the modules using this library, ensure
> that the "Scope" of this library is "Compile" (File -> Project Structure.
> ->
> Modules)
>
> (6) For sbt, it seems that we have to label the scope of all other hadoop
> dependencies (SBT: org.apache.hadoop.hadoop-*) as "Test" (due to poor
> Internet connection?) And this has to be done every time opening IntelliJ
> (due to a bug?)
>
>
>
> (7) Configure debug environment (using LogQuery as an example). Run -> Edit
> Configurations.
>
> Main class: org.apache.spark.examples.LogQuery
>
> VM options: -Dspark.master=local
>
> Working directory: /root/spark-1.1.0
>
> Use classpath of module: spark-examples_2.10
>
> Before launch: External tool: mvn
>
>     Program: /root/Programs/apache-maven-3.2.1/bin/mvn
>
>     Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests package
>
>     Working directory: /root/spark-1.1.0
>
> Before launch: External tool: sbt
>
>     Program: /root/spark-1.1.0/sbt/sbt
>
>     Parameters: -Phadoop-2.2 -Dhadoop.version=2.2.0 assembly
>
>     Working directory: /root/spark-1.1.0
>
>
>
> (8) Click Run -> Debug 'LogQuery' to start debugging
>
>
>
>
>
> Cheers,
>
> Yiming
>
>