You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vicious <et...@gmail.com> on 2008/01/27 02:12:48 UTC

Fetch issue with Feeds

Hi All,

Using the latest nightly build I am trying to run a crawl. I have set the
agent property and all relevant plugin. However as soon as I run the crawl I
get the following error in hadoop.log. I read all the post here and the only
suggestion was the http.agent property should not be empty. Well in my case
it isnt and yet I see the error. Any help will be appreciated.

Thanks-

 fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed with:
java.lang.NullPointerE
 http.Http - java.lang.NullPointerException
 http.Http - at
org.apache.nutch.protocol.Content.getContentType(Content.java:327)
 http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:95)
 http.Http - at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
 http.Http - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
-- 
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15114911.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Fetch issue with Feeds (SOLVED)

Posted by Vinci <vi...@polyu.edu.hk>.
Hi,

finally I figure out the solution:
go to conf/
rename the old mime-types.xml into anyting else,
then copy tika-mimetypes.xml into the same directory with name
mime-types.xml
the crawler should work now.

in short, this is because 1.0-dev using tika, but old-day mime detection
config file is loaded.


Vinci wrote:
> 
> Hi,
> 
> Here is the additional information: before the exception appear, nutch
> advertise 2 message:
> 
> fetching http://cnn.com
> org.apache.tika.mime.MimeUtils load
> INFO loading [mime-types.xml]
> fetch of http://www.cnn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> 
> Seems mime-type has problem....did I need to config the file it loaded?
> 
> 
> 
> Vinci wrote:
>> 
>> Hi All,
>> 
>> I get the same exception when I trying with the nightly build for a
>> static page, any one can help?
>> 
>> 
>> Vicious wrote:
>>> 
>>> Hi All,
>>> 
>>> Using the latest nightly build I am trying to run a crawl. I have set
>>> the agent property and all relevant plugin. However as soon as I run the
>>> crawl I get the following error in hadoop.log. I read all the post here
>>> and the only suggestion was the http.agent property should not be empty.
>>> Well in my case it isnt and yet I see the error. Any help will be
>>> appreciated.
>>> 
>>> Thanks-
>>> 
>>>  fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed
>>> with: java.lang.NullPointerE
>>>  http.Http - java.lang.NullPointerException
>>>  http.Http - at
>>> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
>>>  http.Http - at
>>> org.apache.nutch.protocol.Content.<init>(Content.java:95)
>>>  http.Http - at
>>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>>>  http.Http - at
>>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189897.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Fetch issue with Feeds

Posted by Vinci <vi...@polyu.edu.hk>.
Hi,

Here is the additional information: before the exception appear, nutch
advertise 2 message:

fetching http://cnn.com
org.apache.tika.mime.MimeUtils load
INFO loading [mime-types.xml]
fetch of http://www.cnn.com/ failed with: java.lang.NullPointerException
Fetcher: done

Seems mime-type has problem....did I need to config the file it loaded?



Vinci wrote:
> 
> Hi All,
> 
> I get the same exception when I trying with the nightly build for a static
> page, any one can help?
> 
> 
> Vicious wrote:
>> 
>> Hi All,
>> 
>> Using the latest nightly build I am trying to run a crawl. I have set the
>> agent property and all relevant plugin. However as soon as I run the
>> crawl I get the following error in hadoop.log. I read all the post here
>> and the only suggestion was the http.agent property should not be empty.
>> Well in my case it isnt and yet I see the error. Any help will be
>> appreciated.
>> 
>> Thanks-
>> 
>>  fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed with:
>> java.lang.NullPointerE
>>  http.Http - java.lang.NullPointerException
>>  http.Http - at
>> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
>>  http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:95)
>>  http.Http - at
>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>>  http.Http - at
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189590.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Fetch issue with Feeds

Posted by Vinci <vi...@polyu.edu.hk>.
Hi All,

I get the same exception when I trying with the nightly build for a static
page, any one can help?


Vicious wrote:
> 
> Hi All,
> 
> Using the latest nightly build I am trying to run a crawl. I have set the
> agent property and all relevant plugin. However as soon as I run the crawl
> I get the following error in hadoop.log. I read all the post here and the
> only suggestion was the http.agent property should not be empty. Well in
> my case it isnt and yet I see the error. Any help will be appreciated.
> 
> Thanks-
> 
>  fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed with:
> java.lang.NullPointerE
>  http.Http - java.lang.NullPointerException
>  http.Http - at
> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
>  http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:95)
>  http.Http - at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>  http.Http - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
> 

-- 
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189123.html
Sent from the Nutch - User mailing list archive at Nabble.com.