You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by John Reidy <jo...@reidysystems.com> on 2006/05/03 07:04:28 UTC
Reading data from mysql (was Saving Metadata to Mysql)
Sorry for the delay in replying.
v 0.8 looks very interesting. In time we will store more and more of our
meta data in Nutch.
The aspect that I am looking at is configuring a flexible fetcher that
will read from a variety of sources (eg rdbms and apps running on top of
a database) and then index and make available this information.
The hardest part is working around the java.net class (which for reasons
I can appreciate) cannot be subclassed.
The app I have in mind is a webservice type (document management app) so
all of the calls are urls anyway.
Regards
John
Stefan Groschupf wrote:
> Depends what you are planing to do, nutch 0.8 support meta data that
> is very flexible (key value tuples) and fast.
> Also you can store information in parseData.getMetaData, these will
> be available until indexing as well.
>
>
>
> Am 12.04.2006 um 04:31 schrieb sudhendra seshachala:
>
>> Sorry to just jumpping in.
>> We have doc id associated when we index. We could store the doc id
>> in mysql table.We could use the docid to query the nutch database..
>> When parsing, capture things needed as part of "metadata"
>> Index the metadata. the docId associated is stored in mysql.
>>
>> Does that give any idea ?...
>> Please do share your concerns. I am working on a similar stuff where
>> eventually we have to adopt a database.
>>
>> Thanks
>>
>>
>>
>> John Reidy <jo...@reidysystems.com> wrote: I am looking at something
>> similar.
>>
>> I would guess the place to put it is the indexer. As I understand it
>> the
>> parser runs for just about everything fetched, however the indexer is
>> only run for pages you want to index.
>> I am also looking at having static objects (Eg a connection) that is
>> initialise when the plugin is loaded, ideally through the startup
>> method.
>>
>> Regards
>>
>> John
>>
>>> Hey all,
>>> I have writen a custom HTML parser and indexer. I would like to
>>> save some
>>> information that I have gathered during the parse in a Mysql DB. I
>>> imagine
>>> there could be some performance hit here (e.g. connecting to db).
>>> What's
>>> the best place to add code to save this information - the parser or
>>> the
>>> indexer?
>>>
>>> -Mike
>>> --
>>> View this message in context: http://www.nabble.com/Saving-
>>> Metadata-to-Mysql-t1389216.html#a3732992
>>> Sent from the Nutch - User forum at Nabble.com.
>>>
>>>
>>>
>>
>>
>>
>>
>> Sudhi Seshachala
>> http://sudhilogs.blogspot.com/
>>
>>
>>
>>
>> ---------------------------------
>> How low will we go? Check out Yahoo! Messenger’s low PC-to-Phone
>> call rates.
>
>
> ---------------------------------------------------------------
> company: http://www.media-style.com
> forum: http://www.text-mining.org
> blog: http://www.find23.net
>
>
>