You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jcoffield <co...@hotmail.com> on 2011/08/23 16:59:10 UTC
How to store data in new column in MySQL database Nutch 2.0
Hi there,
I'm a newbie with Nutch. I need to store data from a crawl in specific
columns in the webpage table in the Nutch database in MySQL. I have the
columns being created by changing gora-sql-mapping.xml, and changing schema
and field info in org.apache.nutch.storage.WebPage.
I only need to crawl 2 websites and store information from specific elements
in specific columns. My question is, how do I get Nutch to use these new
columns? I assume I need to create a Parser plugin and set the field values
via a regex. Any suggestions or direction?
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-store-data-in-new-column-in-MySQL-database-Nutch-2-0-tp3278250p3278250.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: How to store data in new column in MySQL database Nutch 2.0
Posted by Markus Jelsma <ma...@openindex.io>.
You're using Nutch 2.0 trunk and i don't know a lot about it. However, if i
would like to send parsed to from a crawl to some DB i would first use the
indexchecker (nutch 1.4) to obtain values to-be indexed from stdout and do
some scripting. If i were to use it on a larger scale i'd modify the indexers
to send data to some DB instead of Solr.
> Hi there,
>
> I'm a newbie with Nutch. I need to store data from a crawl in specific
> columns in the webpage table in the Nutch database in MySQL. I have the
> columns being created by changing gora-sql-mapping.xml, and changing schema
> and field info in org.apache.nutch.storage.WebPage.
>
> I only need to crawl 2 websites and store information from specific
> elements in specific columns. My question is, how do I get Nutch to use
> these new columns? I assume I need to create a Parser plugin and set the
> field values via a regex. Any suggestions or direction?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-store-data-in-new-column-in-MySQ
> L-database-Nutch-2-0-tp3278250p3278250.html Sent from the Nutch - User
> mailing list archive at Nabble.com.