You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by bruce <be...@earthlink.net> on 2006/08/19 03:48:24 UTC

architecture question/thoughts

hi...

i'm playing around with an app that parses websites and extracts
information, returning certain information to my system.

my primary issue has to do with how i might architect the system to place
the information into my database. i'm using/testing with mysql. my question
has to do with how to scale this kind of system. if i have a server, that's
spawing 100's of apps with each app firing off a web/page connection to a
web server, i'm going to have more than enough connections coming back to
swamp out writing to a mysql server...

so how do other apps/crawlers handle this kind of situation... basically,
i'm trying to figure out how to implement some kind of scaling funneling
process/mechanism to allow me to have 10-20 servers crawling the specific
sites, and returning the information to a database...

any thoughts/comments/pointers on how to deal with this will be helpful!!

thanks

-bruce