You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by lonikar <lo...@gmail.com> on 2015/09/11 18:54:59 UTC

Spark 1.5.0: setting up debug env

I have setup spark debug env on windows and mac, and thought its worth
sharing given some of the issues I encountered and the  instructions given
here
<https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse>  
did not work for *eclipse* (possibly outdated now). The first step "sbt/sbt"
or "build/sbt" hangs in downloading sbt with the message "Getting
org.scala-sbt sbt 0.13.7 ...". I tried the alternative "build/mvn
eclipse:eclipse", but that too failed as the generated .classpath files
contained classpathentry only for java files.

1. Build spark using maven on command line. This will download all the
necessary jars from maven repos and speed up eclipse build. Maven 3.3.3 is
required. Spark ships with it. Just use build/mvn and ensure that there is
no "mvn" command in PATH (build/mvn -Pyarn -Phadoop-2.6
-Dhadoop.version=2.6.0 -DskipTests clean package).
2. Download latest scala-ide (4.1.1 as of now) from http://scala-ide.org
3. Check if the eclipse scala maven plugin is installed. If not, install it:
Help --> Install New Software -->
http://alchim31.free.fr/m2e-scala/update-site/ which is sourced from
https://github.com/sonatype/m2eclipse-scala.
4. If using scala 2.10, add installation 2.10.4. If you build spark using
steps in  described here
<http://spark.apache.org/docs/latest/building-spark.html>  , (build/mvn
-Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package), it
gets installed in build/scala-2.10.4. In Eclipse Preferences -> Scala ->
Installations -> Add, specify the <spark-dir>/build/scala-2.10.4/lib.
5. In Eclipse -> Project, disable Build Automatically. This is to avoid
building projects till all projects are imported and some settings are
changed. Otherwise, eclipse takes up hours building projects while in half
baked state.
6. In Eclipse -> Preferences -> Java -> Compiler -> Errors/Warnings -->
Deprecated and Restricted API, change the setting to Warning from earlier
Error. This is to take care of Unsafe classes for project tungsten.
7. Import maven projects: In eclipse, File --> Import --> Maven --> Existing
Maven Projects (*not General --> Existing projects in workspace*).
8. After the projects are completely imported, select all projects except
java8-tests_2.10, spark-assembly_2.10, spark-parent_2.10, right click and
choose Scala -> Set the Scala Installation. Choose 2.10.4. This step is also 
described here
<https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse> 
. It does not work for some projects. Right click on each project,
Properties -> Scala Compiler -> Check Use Project Settings, Select Scala
Installation as scala 2.10.4 and click OK.
9. Some projects will give error "Plugin execution not covered by lifecycle
configuration" when building. The issue is  described here
<http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin> 
. The pom.xml of those projects will need <pluginManagement> ...
</pluginManagement> around the <plugins> like below:

 The projects which need this change are spark-streaming-flume-sink_2.10
(external/flume-sink/pom.xml), spark-repl_2.10 (repl/pom.xml),
spark-sql_2.10 (sql/pom.xml), spark-hive_2.10 (sql/hive/pom.xml),
spark-hive-thriftserver_2.10 (sql/hive-thriftserver_2.10/pom.xml),
spark-unsafe_2.10 (unsafe/pom.xml).
10. Right click on project spark-streaming-flume-sink_2.10, Properties ->
Java Build Path -> Source -> Add Folder. Navigate to target -> scala-2.10 ->
src_managed -> main -> compiled_avro. Check the checkbox and click OK.
11. Now enable Project -> Build Automatically. Sit back and relax. If build
fails for some projects (SBT crashes sometimes), just select those, Project
-> Clean -> Clean selected projects.
12. After the build completes (hopefully without any errors), run/debug an
example from spark-examples_2.10. You should be able to put breakpoints in
spark code and debug. You may have to change source of examples to add
*/.setMaster("local")/* on the */val sparkConf/* line. After this minor
change, it will work. Also, the first time you debug, it will ask you
specify source path. Just select Add -> Java Project -> select all spark
projects. Let the first debugging session complete as will not show any
spark code. You may disable breakpoints in this session to let it go.
Subsequent sessions allow you to walk through step by step in spark code.
Enjoy  

You may not have to go through all this if using scala 2.11 or IntelliJ. But
if you are like me, who uses eclipse and also the spark's current scala
2.10.4, you will find this useful and avoid a lot of googling 

The one issue I encountered is debugging/setting breakpoints in expression
generated java code. This code generated as string in spark-catalyst_2.10
--> org.apache.spark.sql.catalyst.expressions and
org.apache.spark.sql.catalyst.expressions.codegen. If anyone has figured out
how to do it, please update on this thread.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-1-5-0-setting-up-debug-env-tp14056.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org