You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jcoffield <co...@hotmail.com> on 2011/08/23 16:59:10 UTC

How to store data in new column in MySQL database Nutch 2.0

Hi there,

I'm a newbie with Nutch.  I need to store data from a crawl in specific
columns in the webpage table in the Nutch database in MySQL.  I have the
columns being created by changing gora-sql-mapping.xml, and changing schema
and field info in org.apache.nutch.storage.WebPage.

I only need to crawl 2 websites and store information from specific elements
in specific columns. My question is, how do I get Nutch to use these new
columns?  I assume I need to create a Parser plugin and set the field values
via a regex.  Any suggestions or direction?

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-store-data-in-new-column-in-MySQL-database-Nutch-2-0-tp3278250p3278250.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: How to store data in new column in MySQL database Nutch 2.0

Posted by Markus Jelsma <ma...@openindex.io>.
You're using Nutch 2.0 trunk and i don't know a lot about it. However, if i 
would like to send parsed to from a crawl to some DB i would first use the 
indexchecker (nutch 1.4) to obtain values to-be indexed from stdout and do 
some scripting. If i were to use it on a larger scale i'd modify the indexers 
to send data to some DB instead of Solr.

> Hi there,
> 
> I'm a newbie with Nutch.  I need to store data from a crawl in specific
> columns in the webpage table in the Nutch database in MySQL.  I have the
> columns being created by changing gora-sql-mapping.xml, and changing schema
> and field info in org.apache.nutch.storage.WebPage.
> 
> I only need to crawl 2 websites and store information from specific
> elements in specific columns. My question is, how do I get Nutch to use
> these new columns?  I assume I need to create a Parser plugin and set the
> field values via a regex.  Any suggestions or direction?
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-store-data-in-new-column-in-MySQ
> L-database-Nutch-2-0-tp3278250p3278250.html Sent from the Nutch - User
> mailing list archive at Nabble.com.