You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Aisha <ai...@yahoo.com> on 2006/11/03 15:53:20 UTC
Fetcher freezes
Hi,
I don't know why but I have no answer on the 3 forums where I sent my
problem........
As the problem of Fetcher freezes occurs every time I try to fetch my file
system I can't imagine that I am the only one who have this problem and as I
said in my last e-mail, I found many mails about this problem but no
solution seems have been done........
It is a big problem so I don't understand why nobody seems interested on
it........
I try to crawl over my file system but the crawl never finished, it aborted
with the message "Aborting with 3 hung threads".
The number of hung threads is not the same if I retry....
I modify the configuration grawing the number of threads but it doesn't
solve the problem........
Please could somebody help me,
I can't crawl my file system..........
thanks in advance.
Aïcha
--
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7158776
Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Fetcher freezes
Posted by Aisha <ai...@yahoo.com>.
Hi,
I am not in my office so I will try on Monday and send you the logs and file
configuration I use
but the freeze seems not linked with a file in partricular because in the
logs the freezes doesn't occur
at the same time........
Thank you for your answer....
I will contact you on Monday,
Have a good week.
Aïcha
Stefan Groschupf-2 wrote:
>
> Hi,
>
> try to have no regular expression filter and check if this helps.
> Let me know if this solve the problem.
> You may be want to do a thread dump and send the log to the list to
> check where exactly the fetcher freezes.
>
> Stefan
>
> Am 03.11.2006 um 15:53 schrieb Aisha:
>
>>
>> Hi,
>>
>> I don't know why but I have no answer on the 3 forums where I sent my
>> problem........
>> As the problem of Fetcher freezes occurs every time I try to fetch
>> my file
>> system I can't imagine that I am the only one who have this problem
>> and as I
>> said in my last e-mail, I found many mails about this problem but no
>> solution seems have been done........
>> It is a big problem so I don't understand why nobody seems
>> interested on
>> it........
>>
>> I try to crawl over my file system but the crawl never finished, it
>> aborted
>> with the message "Aborting with 3 hung threads".
>>
>> The number of hung threads is not the same if I retry....
>>
>> I modify the configuration grawing the number of threads but it
>> doesn't
>> solve the problem........
>>
>> Please could somebody help me,
>> I can't crawl my file system..........
>>
>> thanks in advance.
>> Aïcha
>>
>> --
>> View this message in context: http://www.nabble.com/Fetcher-freezes-
>> tf2568287.html#a7158776
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
>
>
>
>
>
--
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7159315
Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Fetcher freezes
Posted by Aisha <ai...@yahoo.com>.
Hi,
My configuration was as suggested Dennis Kubes in the nutch-user forum but I
still have the problem.....
I think the problem was fixed for http protocol with the NUTCH-344 and the
configuration :
<property>
<name>http.max.delays</name>
<value>30</value>
</property>
but putting the configuration :
<property>
<name>fetcher.max.crawl.delay</name>
<value>30</value>
</property>
don't fix the problem for the crawling of the file system......
I repeat I am using the nutch nightly build on 19/10/2006
--
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7216524
Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Fetcher freezes
Posted by Aisha <ai...@yahoo.com>.
Hi,
I don't know if I well understood the "no regular expression filter" but I
delete the urlfilter from my nutch-site.xml,
this is my nutch-site.xml configuration :
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>plugin.includes</name>
<value>protocol-file|parse-(text|msword|msexcel|mspowerpoint|rtf|xml|html|js|pdf|oo)|index-basic|query-basic|summary-basic|scoring-opic</value>
</property>
<property>
<name>file.content.ignored</name>
<value>false</value>
</property>
<property>
<name>file.content.limit</name> <value>-1</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>true</value>
</property>
<property>
<name>fetcher.threads.fetch</name>
<value>1000</value>
</property>
<property>
<name>fetcher.threads.per.host</name>
<value>1000</value>
<description>This number is the maximum number of threads that
should be allowed to access a host at one time.</description>
</property>
<property>
<name>fetcher.verbose</name>
<value>true</value>
<description>If true, fetcher will log more verbosely.</description>
</property>
<property>
<name>fetcher.server.delay</name>
<value>5.0</value>
<description>The number of seconds the fetcher will delay between
successive requests to the same server.</description>
</property>
<property>
<name>fetcher.max.crawl.delay</name>
<value>30</value>
</property>
<property>
<name>indexer.max.tokens</name>
<value>Integer.MAX_VALUE</value>
</property>
<property>
<name>db.max.outlinks.per.page</name>
<value>10000</value>
</property>
<property>
<name>db.max.anchor.length</name>
<value>200</value>
<description>The maximum number of characters permitted in an anchor.
</description>
</property>
</configuration>
the fetcher freezes after 2 hours.....
as I said the logs don't give informations because each time I run it, the
freezes never occur on the same directory or file .....
Do I have to make a change in my configuration?
Thanks in advance,
Aïcha
Stefan Groschupf-2 wrote:
>
> Hi,
>
> try to have no regular expression filter and check if this helps.
> Let me know if this solve the problem.
> You may be want to do a thread dump and send the log to the list to
> check where exactly the fetcher freezes.
>
> Stefan
>
> Am 03.11.2006 um 15:53 schrieb Aisha:
>
>>
>> Hi,
>>
>> I don't know why but I have no answer on the 3 forums where I sent my
>> problem........
>> As the problem of Fetcher freezes occurs every time I try to fetch
>> my file
>> system I can't imagine that I am the only one who have this problem
>> and as I
>> said in my last e-mail, I found many mails about this problem but no
>> solution seems have been done........
>> It is a big problem so I don't understand why nobody seems
>> interested on
>> it........
>>
>> I try to crawl over my file system but the crawl never finished, it
>> aborted
>> with the message "Aborting with 3 hung threads".
>>
>> The number of hung threads is not the same if I retry....
>>
>> I modify the configuration grawing the number of threads but it
>> doesn't
>> solve the problem........
>>
>> Please could somebody help me,
>> I can't crawl my file system..........
>>
>> thanks in advance.
>> Aïcha
>>
>> --
>> View this message in context: http://www.nabble.com/Fetcher-freezes-
>> tf2568287.html#a7158776
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
>
>
>
>
>
--
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7199731
Sent from the Nutch - Dev mailing list archive at Nabble.com.
Re: Fetcher freezes
Posted by Stefan Groschupf <sg...@101tec.com>.
Hi,
try to have no regular expression filter and check if this helps.
Let me know if this solve the problem.
You may be want to do a thread dump and send the log to the list to
check where exactly the fetcher freezes.
Stefan
Am 03.11.2006 um 15:53 schrieb Aisha:
>
> Hi,
>
> I don't know why but I have no answer on the 3 forums where I sent my
> problem........
> As the problem of Fetcher freezes occurs every time I try to fetch
> my file
> system I can't imagine that I am the only one who have this problem
> and as I
> said in my last e-mail, I found many mails about this problem but no
> solution seems have been done........
> It is a big problem so I don't understand why nobody seems
> interested on
> it........
>
> I try to crawl over my file system but the crawl never finished, it
> aborted
> with the message "Aborting with 3 hung threads".
>
> The number of hung threads is not the same if I retry....
>
> I modify the configuration grawing the number of threads but it
> doesn't
> solve the problem........
>
> Please could somebody help me,
> I can't crawl my file system..........
>
> thanks in advance.
> Aïcha
>
> --
> View this message in context: http://www.nabble.com/Fetcher-freezes-
> tf2568287.html#a7158776
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com