You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by weishenyun <wl...@yahoo.com.cn> on 2012/09/13 05:36:18 UTC

Problems in Nutch 2.0 with HBase storage

Hi everyone here,
These days, I used Nutch 2.0 to crawl pages and stored them into HBase. But
I don't konw much about HBase table schema in Nutch 2.0. There are many
column families and qualifiers in short names such as f:bas, f:st, f:ts,
f:cnt and etc. Can someone explain all these schema about column families
and qualifiers in detail. For example, if I want to get crawl status(just
like http status code) and retry counters of a page, which columns should I
refer to?



--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-in-Nutch-2-0-with-HBase-storage-tp4007355.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems in Nutch 2.0 with HBase storage

Posted by weishenyun <wl...@yahoo.com.cn>.
Thanks very much!



--
View this message in context: http://lucene.472066.n3.nabble.com/Problems-in-Nutch-2-0-with-HBase-storage-tp4007355p4007420.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Problems in Nutch 2.0 with HBase storage

Posted by Ferdy Galema <fe...@kalooga.com>.
Hi,

Just check the mapping file (gora-hbase-mapping.xml), it maps every field
from webpage to the HBase columns.

Ferdy.

On Thu, Sep 13, 2012 at 5:36 AM, weishenyun <wl...@yahoo.com.cn> wrote:

> Hi everyone here,
> These days, I used Nutch 2.0 to crawl pages and stored them into HBase. But
> I don't konw much about HBase table schema in Nutch 2.0. There are many
> column families and qualifiers in short names such as f:bas, f:st, f:ts,
> f:cnt and etc. Can someone explain all these schema about column families
> and qualifiers in detail. For example, if I want to get crawl status(just
> like http status code) and retry counters of a page, which columns should I
> refer to?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problems-in-Nutch-2-0-with-HBase-storage-tp4007355.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>