You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Andrew McCall (JIRA)" <ji...@apache.org> on 2009/02/19 14:54:01 UTC

[jira] Commented: (NUTCH-650) Hbase Integration

    [ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674996#action_12674996 ] 

Andrew McCall commented on NUTCH-650:
-------------------------------------

Hi Doğacan, 

I've been running this on a pseudo distributed hadoop/hbase install I setup for the purpose and for testing and development on my own. To get it to run out of the box I needed to make a couple of changes - I've attached a patch with them in it.  

Patch is against the git repository, btw. 

1) I changed the nutch-default.xml to use the hbase classes. 
2) I changed the parse-plugins.xml file to use the hbase classes
3) I tweaked the IndexerHbase so that the ouput is wrapped in a hadoop.io.Text otherwise it wouldn't work for me. 
4) Altered the WebTableCreator so that it's a command that can be executed from the command line like any other, also made the table name an option like the others. 

This should now checkout, compile and allow the following comands to be run:

./bin/nutch org.apache.nutchbase.util.hbase.WebTableCreator webtable

./bin/nutch org.apache.nutchbase.crawl.InjectorHbase webtable file:///path/to/urls_dir

./bin/nutch org.apache.nutchbase.crawl.GeneratorHbase webtable

./bin/nutch org.apache.nutchbase.fetcher.FetcherHbase webtable

./bin/nutch org.apache.nutchbase.parse.ParseTable webtable

./bin/nutch org.apache.nutchbase.indexer.IndexerHbase /index webtable

./bin/nutch org.apache.nutchbase.crawl.UpdateTable webtable

I've been running a test crawl using this code and it seems to be working well for me. 



> Hbase Integration
> -----------------
>
>                 Key: NUTCH-650
>                 URL: https://issues.apache.org/jira/browse/NUTCH-650
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>             Fix For: 1.1
>
>         Attachments: hbase-integration_v1.patch, hbase_v2.patch, nutch-habase.patch
>
>
> This issue will track nutch/hbase integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.