You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vinci <vi...@polyu.edu.hk> on 2008/01/30 20:24:41 UTC

Re: Fetch issue with Feeds (SOLVED)

Hi,

finally I figure out the solution:
go to conf/
rename the old mime-types.xml into anyting else,
then copy tika-mimetypes.xml into the same directory with name
mime-types.xml
the crawler should work now.

in short, this is because 1.0-dev using tika, but old-day mime detection
config file is loaded.


Vinci wrote:
> 
> Hi,
> 
> Here is the additional information: before the exception appear, nutch
> advertise 2 message:
> 
> fetching http://cnn.com
> org.apache.tika.mime.MimeUtils load
> INFO loading [mime-types.xml]
> fetch of http://www.cnn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> 
> Seems mime-type has problem....did I need to config the file it loaded?
> 
> 
> 
> Vinci wrote:
>> 
>> Hi All,
>> 
>> I get the same exception when I trying with the nightly build for a
>> static page, any one can help?
>> 
>> 
>> Vicious wrote:
>>> 
>>> Hi All,
>>> 
>>> Using the latest nightly build I am trying to run a crawl. I have set
>>> the agent property and all relevant plugin. However as soon as I run the
>>> crawl I get the following error in hadoop.log. I read all the post here
>>> and the only suggestion was the http.agent property should not be empty.
>>> Well in my case it isnt and yet I see the error. Any help will be
>>> appreciated.
>>> 
>>> Thanks-
>>> 
>>>  fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed
>>> with: java.lang.NullPointerE
>>>  http.Http - java.lang.NullPointerException
>>>  http.Http - at
>>> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
>>>  http.Http - at
>>> org.apache.nutch.protocol.Content.<init>(Content.java:95)
>>>  http.Http - at
>>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>>>  http.Http - at
>>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
>>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189897.html
Sent from the Nutch - User mailing list archive at Nabble.com.