You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2012/09/11 22:27:47 UTC

[Nutch Wiki] Update of "RunNutchInEclipse" by SebastianNagel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "RunNutchInEclipse" page has been changed by SebastianNagel:
http://wiki.apache.org/nutch/RunNutchInEclipse?action=diff&rev1=37&rev2=38

Comment:
using Java remote debugger, debugging and timeouts

  Generator$Selector [line: 108] - map
  OutlinkExtractor [line: 71 & 74] - getOutlinks
  }}}
+ 
+ === Remote Debugging in Eclipse ===
+  1. create a new Debug Configuration as [[http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.jdt.doc.user%2Ftasks%2Ftask-remotejava_launch_config.htm|Remote Java Application]] and remember the port (here: 37649)
+  1. launch nutch from command-line but add options to use the [[http://docs.oracle.com/javase/6/docs/technotes/guides/jpda/architecture.html#jdwp|Java Debugger JDWP Agent Library]], e.g. from bash:
+ {{{
+ % export NUTCH_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=localhost:37649"
+ % $NUTCH_HOME/bin/nutch parsechecker http://myurl.com/
+ }}}
+  1.#3 the application will be suspended just after launch
+  1. now go to Eclipse, set appropriate break-points, and run the previously created Debug Configuration
+ Instead of creating an extra launch configuration for every tool you want to debug, one single configuration is enough to debug any tool (parsechecker, indexchecher, URL filter, etc.) and that even remotely (crawler/tool running on server, Eclipse debugger locally).
+ 
+ === Debugging and Timeouts ===
+ Debugging takes time, esp. when inspecting variables, stack traces, etc. Usually too much time, so that some timeout will apply and stop the application. Set timeouts in the nutch-site.xml used for debugging to a rather high value (or -1 for unlimited), e.g., when debugging the parser:
+ {{{
+ <property>
+   <name>parser.timeout</name>
+   <value>-1</value>
+ </property>
+ }}}
+ 
  == If things do not work... ==
  Yes, Nutch and Eclipse can be a difficult companionship sometimes ;-)