You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Aisha <ai...@yahoo.com> on 2006/11/03 15:53:20 UTC

Fetcher freezes

Hi,

I don't know why but I have no answer on the 3 forums where I sent my
problem........
As the problem of Fetcher freezes occurs every time I try  to fetch my file
system I can't imagine that I am the only one who have this problem and as I
said in my last e-mail, I found many mails about this problem but no
solution seems have been done........
It is a big problem so I don't understand why nobody seems interested on
it........

I try to crawl over my file system but the crawl never finished, it aborted
with the message "Aborting with 3 hung threads". 

The number of hung threads is not the same if I retry....

I modify the configuration grawing the number of threads but it doesn't
solve the problem........ 

Please could somebody help me,
I can't crawl my file system..........

thanks in advance.
Aïcha

-- 
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7158776
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Fetcher freezes

Posted by Aisha <ai...@yahoo.com>.
Hi,

I am not in my office so I will try on Monday and send you the logs and file
configuration I use
but the freeze seems not linked with a file in partricular because in the
logs the freezes doesn't occur 
at the same time........

Thank you for your answer....
I will contact you on Monday,
Have a good week.
Aïcha


Stefan Groschupf-2 wrote:
> 
> Hi,
> 
> try to have no regular expression filter and check if this helps.
> Let me know if this solve the problem.
> You may be want to do a thread dump and send the log to the list to  
> check where exactly the fetcher freezes.
> 
> Stefan
> 
> Am 03.11.2006 um 15:53 schrieb Aisha:
> 
>>
>> Hi,
>>
>> I don't know why but I have no answer on the 3 forums where I sent my
>> problem........
>> As the problem of Fetcher freezes occurs every time I try  to fetch  
>> my file
>> system I can't imagine that I am the only one who have this problem  
>> and as I
>> said in my last e-mail, I found many mails about this problem but no
>> solution seems have been done........
>> It is a big problem so I don't understand why nobody seems  
>> interested on
>> it........
>>
>> I try to crawl over my file system but the crawl never finished, it  
>> aborted
>> with the message "Aborting with 3 hung threads".
>>
>> The number of hung threads is not the same if I retry....
>>
>> I modify the configuration grawing the number of threads but it  
>> doesn't
>> solve the problem........
>>
>> Please could somebody help me,
>> I can't crawl my file system..........
>>
>> thanks in advance.
>> Aïcha
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fetcher-freezes- 
>> tf2568287.html#a7158776
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7159315
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Fetcher freezes

Posted by Aisha <ai...@yahoo.com>.
Hi,

My configuration was as suggested Dennis Kubes in the nutch-user forum but I
still have the problem.....
I think the problem was fixed for http protocol with the NUTCH-344 and the
configuration :
<property>
  <name>http.max.delays</name>
  <value>30</value>
 </property> 

but putting the configuration :
<property>
 <name>fetcher.max.crawl.delay</name>
 <value>30</value>
</property> 
 
don't fix the problem for the crawling of the file system......
I repeat I am using the nutch nightly build on 19/10/2006


-- 
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7216524
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Fetcher freezes

Posted by Aisha <ai...@yahoo.com>.
Hi,

I don't know if I well understood the "no regular expression filter" but I
delete the urlfilter from my nutch-site.xml,

this is my nutch-site.xml configuration :

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>

<property>
<name>plugin.includes</name> 

<value>protocol-file|parse-(text|msword|msexcel|mspowerpoint|rtf|xml|html|js|pdf|oo)|index-basic|query-basic|summary-basic|scoring-opic</value>
</property> 

<property>
  <name>file.content.ignored</name>
  <value>false</value>
</property>

<property>
<name>file.content.limit</name> <value>-1</value>
</property> 

<property>
  <name>db.ignore.external.links</name>
  <value>true</value>
</property>

<property>
  <name>fetcher.threads.fetch</name>
  <value>1000</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>1000</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>

<property>
  <name>fetcher.verbose</name>
  <value>true</value>
  <description>If true, fetcher will log more verbosely.</description>
</property>
<property>
  <name>fetcher.server.delay</name>
  <value>5.0</value>
  <description>The number of seconds the fetcher will delay between 
   successive requests to the same server.</description>
</property>

<property>
 <name>fetcher.max.crawl.delay</name>
 <value>30</value>
</property> 

<property>
  <name>indexer.max.tokens</name>
  <value>Integer.MAX_VALUE</value>
</property>


<property>
  <name>db.max.outlinks.per.page</name>
  <value>10000</value>
</property>
<property>
  <name>db.max.anchor.length</name>
  <value>200</value>
  <description>The maximum number of characters permitted in an anchor.
  </description>
</property>
</configuration>


the fetcher freezes after 2 hours.....
as I said the logs don't give informations because each time I run it, the
freezes never occur on the same directory or file .....
Do I have to make a change in my configuration?

Thanks in advance,
Aïcha


Stefan Groschupf-2 wrote:
> 
> Hi,
> 
> try to have no regular expression filter and check if this helps.
> Let me know if this solve the problem.
> You may be want to do a thread dump and send the log to the list to  
> check where exactly the fetcher freezes.
> 
> Stefan
> 
> Am 03.11.2006 um 15:53 schrieb Aisha:
> 
>>
>> Hi,
>>
>> I don't know why but I have no answer on the 3 forums where I sent my
>> problem........
>> As the problem of Fetcher freezes occurs every time I try  to fetch  
>> my file
>> system I can't imagine that I am the only one who have this problem  
>> and as I
>> said in my last e-mail, I found many mails about this problem but no
>> solution seems have been done........
>> It is a big problem so I don't understand why nobody seems  
>> interested on
>> it........
>>
>> I try to crawl over my file system but the crawl never finished, it  
>> aborted
>> with the message "Aborting with 3 hung threads".
>>
>> The number of hung threads is not the same if I retry....
>>
>> I modify the configuration grawing the number of threads but it  
>> doesn't
>> solve the problem........
>>
>> Please could somebody help me,
>> I can't crawl my file system..........
>>
>> thanks in advance.
>> Aïcha
>>
>> -- 
>> View this message in context: http://www.nabble.com/Fetcher-freezes- 
>> tf2568287.html#a7158776
>> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>>
>>
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fetcher-freezes-tf2568287.html#a7199731
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Fetcher freezes

Posted by Stefan Groschupf <sg...@101tec.com>.
Hi,

try to have no regular expression filter and check if this helps.
Let me know if this solve the problem.
You may be want to do a thread dump and send the log to the list to  
check where exactly the fetcher freezes.

Stefan

Am 03.11.2006 um 15:53 schrieb Aisha:

>
> Hi,
>
> I don't know why but I have no answer on the 3 forums where I sent my
> problem........
> As the problem of Fetcher freezes occurs every time I try  to fetch  
> my file
> system I can't imagine that I am the only one who have this problem  
> and as I
> said in my last e-mail, I found many mails about this problem but no
> solution seems have been done........
> It is a big problem so I don't understand why nobody seems  
> interested on
> it........
>
> I try to crawl over my file system but the crawl never finished, it  
> aborted
> with the message "Aborting with 3 hung threads".
>
> The number of hung threads is not the same if I retry....
>
> I modify the configuration grawing the number of threads but it  
> doesn't
> solve the problem........
>
> Please could somebody help me,
> I can't crawl my file system..........
>
> thanks in advance.
> Aïcha
>
> -- 
> View this message in context: http://www.nabble.com/Fetcher-freezes- 
> tf2568287.html#a7158776
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com