You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ShiQing Ma <sh...@gmail.com> on 2011/09/26 16:35:43 UTC

Nutch and Hadoop

Hi. Sorry to bother.

I am a student who is doing some research on Nutch  Hadoop and HBase. I am
willing to use them as a whole. My question is how to write data produced by
Nutch directly to Hbase. Is that possible or I just missed some important
information in your wiki?

Also in the future, I would like to write my own algorithm which is
different from page-rank and something like that. I would like to know if a
more friendly way which provides us data could be provided to change the
algorithms.

It will be appreciated if you can write back soon. Thank you for sparing me
time.

Sincerely,
Shiqing Ma

Re: Nutch and Hadoop

Posted by Alexander Aristov <al...@gmail.com>.
Study Nutch 2.0.

Best Regards
Alexander Aristov


On 26 September 2011 18:35, ShiQing Ma <sh...@gmail.com> wrote:

> Hi. Sorry to bother.
>
> I am a student who is doing some research on Nutch  Hadoop and HBase. I am
> willing to use them as a whole. My question is how to write data produced
> by
> Nutch directly to Hbase. Is that possible or I just missed some important
> information in your wiki?
>
> Also in the future, I would like to write my own algorithm which is
> different from page-rank and something like that. I would like to know if a
> more friendly way which provides us data could be provided to change the
> algorithms.
>
> It will be appreciated if you can write back soon. Thank you for sparing me
> time.
>
> Sincerely,
> Shiqing Ma
>

Re: Nutch and Hadoop

Posted by lewis john mcgibbney <le...@gmail.com>.
Hi Shiging Ma,

Please see below

On Mon, Sep 26, 2011 at 3:35 PM, ShiQing Ma <sh...@gmail.com> wrote:

> Hi. Sorry to bother.
>
> I am a student who is doing some research on Nutch  Hadoop and HBase. I am
> willing to use them as a whole. My question is how to write data produced
> by
> Nutch directly to Hbase. Is that possible or I just missed some important
> information in your wiki?
>
This has been discussed briefly in the past, however currently the method
for writing data to HBase is via  nutchgora [1] which uses Gora [2] as a
storage abstraction layer. As Alexander added, please do some reading in to
the architecture and how it differs from the main 1.X development. It should
be noted at this stage that nutchgora is not production ready, there are
some pending issues which have to be addressed if you have a look at our
Jira you will get a taste of what these are.


> Also in the future, I would like to write my own algorithm which is
> different from page-rank and something like that. I would like to know if a
> more friendly way which provides us data could be provided to change the
> algorithms.
>
I think there needs to be a bit more clarity on what this encapsulates. I
can not comment as it is too vague.


>
> It will be appreciated if you can write back soon. Thank you for sparing me
> time.
>
> Sincerely,
> Shiqing Ma
>

[1] https://svn.apache.org/repos/asf/nutch/branches/nutchgora/
[2] http://incubator.apache.org/gora/
-- 
*Lewis*