You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Reidy <jo...@reidysystems.com> on 2006/05/03 07:04:28 UTC

Reading data from mysql (was Saving Metadata to Mysql)

Sorry for the delay in replying.

v 0.8 looks very interesting. In time we will store more and more of our 
meta data in Nutch.

The aspect that I am looking at is configuring a flexible fetcher that 
will read from a variety of sources (eg rdbms and apps running on top of 
a database) and then index and make available this information.
The hardest part is working around the java.net class (which for reasons 
I can appreciate) cannot be subclassed.

The app I have in mind is a webservice type (document management app) so 
all of the calls are urls anyway.

Regards

John

Stefan Groschupf wrote:

> Depends what you are planing to do, nutch 0.8 support meta data that  
> is very flexible (key value tuples) and fast.
> Also you can store information in parseData.getMetaData, these will  
> be available until indexing as well.
>
>
>
> Am 12.04.2006 um 04:31 schrieb sudhendra seshachala:
>
>> Sorry to just jumpping in.
>> We have doc id associated when we index.  We could store the doc id  
>> in mysql table.We could use the docid to query the nutch database..
>> When parsing, capture things needed as part of "metadata"
>> Index the metadata. the docId associated is stored in mysql.
>>
>> Does that give any idea ?...
>> Please do share your concerns. I am working on a similar stuff  where 
>> eventually we have to adopt a database.
>>
>> Thanks
>>
>>
>>
>> John Reidy <jo...@reidysystems.com> wrote: I am looking at  something 
>> similar.
>>
>> I would guess the place to put it is the indexer. As I understand  it 
>> the
>> parser runs for just about everything fetched, however the indexer is
>> only run for pages you want to index.
>> I am also looking at having static objects (Eg a connection) that is
>> initialise when the plugin is loaded, ideally through the startup  
>> method.
>>
>> Regards
>>
>> John
>>
>>> Hey all,
>>> I have writen a custom HTML parser and indexer.  I would like to  
>>> save some
>>> information that I have gathered during the parse in a Mysql DB.   I 
>>> imagine
>>> there could be some performance hit here (e.g. connecting to db).   
>>> What's
>>> the best place to add code to save this information - the parser  or 
>>> the
>>> indexer?
>>>
>>> -Mike
>>> -- 
>>> View this message in context: http://www.nabble.com/Saving- 
>>> Metadata-to-Mysql-t1389216.html#a3732992
>>> Sent from the Nutch - User forum at Nabble.com.
>>>
>>>
>>>
>>
>>
>>
>>
>>   Sudhi Seshachala
>>   http://sudhilogs.blogspot.com/
>>
>>
>>
>>        
>> ---------------------------------
>> How low will we go? Check out Yahoo! Messenger’s low  PC-to-Phone  
>> call rates.
>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>
>