You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2012/06/20 14:49:16 UTC

[Nutch Wiki] Update of "GORA_HBase" by FerdyGalema

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "GORA_HBase" page has been changed by FerdyGalema:
http://wiki.apache.org/nutch/GORA_HBase?action=diff&rev1=14&rev2=15

Comment:
redirect to Nutch2Tutorial

+ #REDIRECT Nutch2Tutorial
- = Nutch 2.0 Tutorial =
- {{http://www.interadvertising.co.uk/files/nutch_logo_medium.gif}} {{http://gora.apache.org/images/gora-logo.png}} {{http://hbase.apache.org/images/hbase_logo.png}}
  
- This document describes how to get Nutch 2.0 to use HBase as a storage backend for Gora.
- 
-  * Grab a distribution of Nutch 2.X from [[http://www.apache.org/dyn/closer.cgi/nutch/|here]]
-  * Install and configure HBase. You can get it [[http://www.apache.org/dyn/closer.cgi/hbase/|here]] ('''N.B.''' Gora 0.2 uses HBase 0.90.4, however the setup is know to work with more recent versions of HBase.)
-  * Specify the GORA backend in nutch-site.xml
- 
- {{{
- <property>
-  <name>storage.data.store.class</name>
-  <value>org.apache.gora.hbase.store.HBaseStore</value>
-  <description>Default class for storing data</description>
- </property>
- }}}
- 
-  * Ensure the HBase gora-hbase dependency is available in ivy/ivy.xml
- 
- {{{
-     <!-- Uncomment this to use HBase as Gora backend. -->
-     
-     <dependency org="org.apache.gora" name="gora-hbase" rev="0.2" conf="*->default" />
-     
- }}}
- 
-  * Compile Nutch -> ant runtime
-  * Make sure HBase is started and working properly as per the quick start tutorial [[http://hbase.apache.org/book/quickstart.html|here]]
- 
- You should then be able to use it. Try going to'' $NUTCH_HOME/runtime/local/bin'' and do :
- 
- {{{
-   nutch inject /someseedDir
-   nutch readdb
- }}}
- 
- You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''.
- 
- For more details of the command line interface options, please see [[http://wiki.apache.org/nutch/CommandLineOptions|here]], or of course run ./bin/nutch which will print usage to std out.
- Finally, for a more detailed Nutch (1.X) tutorial, please see [[http://wiki.apache.org/nutch/NutchTutorial|here]]
- 
- '''back to FrontPage'''
-