You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Keith Campbell <ke...@mac.com> on 2005/11/26 17:57:43 UTC

unsubscribe me please

 
On Thursday, November 24, 2005, at 04:03PM, Jérôme Charron <je...@gmail.com> wrote:

>> Until last years there is one thing I notice that matters in a search
>> engine - minimalism.
>
>If you are honnest Stefan, take a closer look at the end of the proposal
>(here is a copy):
>Issues
>
>Create performance benchmarks and ensure that the new implementation gives
>at least the same performances as the parse-html plugin (the most used parse
>plugin in a whole web crawling)
>
>Minimalism.
>> Minimalism == speed, speed == scalability,
>
>speed == scalability ????
>Oh, damned, is it a new theory Stefan?
>
>
>> scalability == serious
>
>high availability == serious (too)
>monitoring == serious (too)
>there is a lot of serious stuff you know, and I really think that
>features == serious (too)
>
>I don't think it would be a good move to slow down html parsing (most
>> used parser) to make rss parser writing more easier for developers.
>
>One more time: take a closer look at the proposal. The idea is to provide a
>convenient
>way to add some markup language related plugins (you know rss and atom are
>the first steps to a more structured content... more is to come)
>Not replacing the existing html and rss ones if their performance are
>better.
>Adapting the html and rss parsers to the proposal is just for archecture
>"beauty" purposes, but it is not mandatory.
>You know, actually, Nutch is widely used for thematic and  intranet search
>engines. And in such a context this proposal really makes sense (as in such
>a context it makes sense to have a protocol-jdbc plugin for instance).
>
>>From my perspective we have much more general things to solve in
>> nutch (manageability, monitoring, ndfs block based task-routing, more
>> dynamic search servers) than improving thing we already have.
>
>It's your point of view.
>You know, I think there is something magic on nutch. It is that peoples are
>focused on different subjects.
>Some are more focused on infrastructure, some others on parsing, some others
>on language technology...
>That's a big chance for nutch... our complementarity...
>(but it's true the subjects you mentionned are some very intersting
>improvements, especially monitoring. Cannot be a serious product deployed on
>many nodes if there is no way to monitor the whole system).
>
>
>> Anyway as you may know we have a plugin system and one goal of the
>> plugin system is to give developers the freedom to develop custom
>> plugins. :-)
>
>Yes, since I have corrected many bugs in the plugin system (not yours I
>hope), I clearly understand how it works, and what's its goal...
> ;-)
>
>P.S. Do you think it makes sense to run another public nutch mailing
>> list, since 'THE nutch [...]' (mailing list  is nutch-
>> dev@lucene.apache.org), 'Isn't it?'
>> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg01513.html
>
>Is there another public nutch mailing list somewhere Stefan?
>Please give me the address...
>
>Best Regards
>
>Jérôme
>
>--
>http://motrech.free.fr/
>http://www.frutch.org/
>
>