You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by feng lu <am...@gmail.com> on 2014/01/01 14:17:12 UTC

Re: Store specific nutch output values in database

currently there are no way to define some specific fields that you needs to
store in DB. each field is useful in crawl processing.

why do you want to store some specific fields in DB?


On Tue, Dec 31, 2013 at 2:13 AM, rk_sharma <rk...@yahoo.com> wrote:

> Hi.
>
> I am using Nutch-2.1 with mysql as database. My crawling result is correct
> and storing multiple fields in DB, my DB field are
>
> +-------------------+---------------+------+-----+---------+-------+
> | Field             | Type          | Null | Key | Default | Extra |
> +-------------------+---------------+------+-----+---------+-------+
> | id                | varchar(767)  | NO   | PRI | NULL    |       |
> | headers           | blob          | YES  |     | NULL    |       |
> | text              | mediumtext    | YES  |     | NULL    |       |
> | status            | int(11)       | YES  |     | NULL    |       |
> | markers           | blob          | YES  |     | NULL    |       |
> | parseStatus       | blob          | YES  |     | NULL    |       |
> | modifiedTime      | bigint(20)    | YES  |     | NULL    |       |
> | score             | float         | YES  |     | NULL    |       |
> | typ               | varchar(32)   | YES  |     | NULL    |       |
> | baseUrl           | varchar(767)  | YES  |     | NULL    |       |
> | content           | longblob      | YES  |     | NULL    |       |
> | title             | varchar(2048) | YES  |     | NULL    |       |
> | reprUrl           | varchar(767)  | YES  |     | NULL    |       |
> | fetchInterval     | int(11)       | YES  |     | NULL    |       |
> | prevFetchTime     | bigint(20)    | YES  |     | NULL    |       |
> | inlinks           | mediumblob    | YES  |     | NULL    |       |
> | prevSignature     | blob          | YES  |     | NULL    |       |
> | outlinks          | mediumblob    | YES  |     | NULL    |       |
> | fetchTime         | bigint(20)    | YES  |     | NULL    |       |
> | retriesSinceFetch | int(11)       | YES  |     | NULL    |       |
> | protocolStatus    | blob          | YES  |     | NULL    |       |
> | signature         | blob          | YES  |     | NULL    |       |
> | metadata          | blob          | YES  |     | NULL    |       |
> +-------------------+---------------+------+-----+---------+-------+
>
> is there are any mechanism through which i can remove some column name. or
> we can say i need storage of only some specific column.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Store-specific-nutch-output-values-in-database-tp4108762.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
Don't Grow Old, Grow Up... :-)