You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by og...@yahoo.com on 2006/08/09 21:16:28 UTC
Re: [Nutch-general] Single DFS or alternative architectures for performance?
Hi Dennis,
I'd be curious about the outcome of your experiment, so please post the summary, if you remember.
Thanks,
Otis
----- Original Message ----
From: Dennis Kubes <nu...@dragonflymc.com>
To: nutch-user@lucene.apache.org
Sent: Wednesday, August 9, 2006 10:39:38 AM
Subject: Re: [Nutch-general] Single DFS or alternative architectures for performance?
You wouldn't want to use the DFS for searching. You would want to use
the DFS/MapReduce for creating the index and slicing it up into certain
segment sizes of say 1-2 million pages. Then those individual index
segments would need to be moved to a local file systems that have search
servers running each searching that specific part of the index. You
would then have the search client (usually a website) sit in front of
the search servers and use the searchservers.txt file to specify the
search servers it connects to. The search client would aggregate the
results of the multiple index search servers and return the results to
the client.
We are currently using 1 million pages per index segment although others
on the list have stated that they have gotten up to 2 million pages
without problems. After that the query tends to slow down because of
the length of time it takes to read individual index segments. We have
been running individual servers for each search segments but are
currently playing around with having a single search server with many
small disks (say 10 x 20G) with each disk having an index segment. I
don't know if that will work though.
Dennis
Murat Ali Bayir wrote:
> Hi everybody,
>
> Does a system with one DFS (crawl, parse, index, and search etc. all
> on 1 DFS)
> have performance problems at search part? What if 2 DFS were used? One
> for
> search part (getting summary etc.) and the other one is for the other
> nutch operations
> (fetch, parse, index etc.). Or is there any alternative architectures
> for systems performing
> all the nutch functions concurrently on one DFS?
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general
Re: HTMLParseFilter is not called by ParseSegment (nutch parse command)
Posted by Bipin Parmar <bi...@yahoo.com>.
Hi,
Please ignore my earlier question regarding the parse
command / HTMLParseFilter plugin. It was my mistake.
The HTMLParseFilter implementing plugins are called
during parse.
Thank you,
Bipin
--- Bipin Parmar <bi...@yahoo.com> wrote:
> Hi,
>
> I have written a plugin implementing the
> org.apache.nutch.parse.HtmlParseFilter extension
> point. When I execute "fetch", it gets appropriately
> called.
>
> When I execute "fetch -noParsing", it does not get
> called. I think this is how it is supposed to work.
>
> However when I execute "parse", I thought my
> HtmlParseFilter implementing plugin will be called.
> However it is not. The parse of the segment is
> executed successfully.
>
> Shouldn't "parse" call HTMLParseFilter implementing
> plugins?
>
> I have the same nutch-default.xml for both fetch as
> well as parse commands. I tried changing
> parse-plugins.xml by adding my plugin to "text/html"
> content type but it did not help.
>
> Please help!
>
> Thank you,
>
> Bipin
> I am using nutch-nightly build date 08/07/2006.
>
HTMLParseFilter is not called by ParseSegment (nutch parse command)
Posted by Bipin Parmar <bi...@yahoo.com>.
Hi,
I have written a plugin implementing the
org.apache.nutch.parse.HtmlParseFilter extension
point. When I execute "fetch", it gets appropriately
called.
When I execute "fetch -noParsing", it does not get
called. I think this is how it is supposed to work.
However when I execute "parse", I thought my
HtmlParseFilter implementing plugin will be called.
However it is not. The parse of the segment is
executed successfully.
Shouldn't "parse" call HTMLParseFilter implementing
plugins?
I have the same nutch-default.xml for both fetch as
well as parse commands. I tried changing
parse-plugins.xml by adding my plugin to "text/html"
content type but it did not help.
Please help!
Thank you,
Bipin
I am using nutch-nightly build date 08/07/2006.