You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by j....@thomsonreuters.com on 2012/08/13 05:26:47 UTC

Understanding the columns/fields in the Nutch 2.0 Webpage Table

In order to help myself and hopefully others better understand the
columns/fields in the Nutch 2.0 webpage table I have put together a
short article at http://nlp.solutions.asia/?p=232 for beginners. I would
appreciate if anybody who knows Nutch 2.0 could quickly look it over for
blatant mistakes as I am sure with my level of knowledge there are more
than a few. Also if anybody could add to the markers field explanation
it would be appreciated as it is a bit of a black box to me.

 

Again if it might be useful to beginners, feel free to link or copy it
to the Nutch Wiki.


Re: Understanding the columns/fields in the Nutch 2.0 Webpage Table

Posted by Ferdy Galema <fe...@kalooga.com>.
Hi,

Thanks. I've added the link to the wiki frontpage. Article looks fine! Two
minor things:

- The different markers actually represent batchId values. So batchId is
used as a value for a marker.
- The http.redirect.max is actually broken. Every redirect is always later
handled. (As if the property is set to 0). Unfortunately there are some
other properties not yet implemented. In time we should get this fixed.

Ferdy.

On Mon, Aug 13, 2012 at 5:26 AM, <j....@thomsonreuters.com> wrote:

> In order to help myself and hopefully others better understand the
> columns/fields in the Nutch 2.0 webpage table I have put together a
> short article at http://nlp.solutions.asia/?p=232 for beginners. I would
> appreciate if anybody who knows Nutch 2.0 could quickly look it over for
> blatant mistakes as I am sure with my level of knowledge there are more
> than a few. Also if anybody could add to the markers field explanation
> it would be appreciated as it is a bit of a black box to me.
>
>
>
> Again if it might be useful to beginners, feel free to link or copy it
> to the Nutch Wiki.
>
>