You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vicious <et...@gmail.com> on 2008/01/27 02:12:48 UTC
Fetch issue with Feeds
Hi All,
Using the latest nightly build I am trying to run a crawl. I have set the
agent property and all relevant plugin. However as soon as I run the crawl I
get the following error in hadoop.log. I read all the post here and the only
suggestion was the http.agent property should not be empty. Well in my case
it isnt and yet I see the error. Any help will be appreciated.
Thanks-
fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed with:
java.lang.NullPointerE
http.Http - java.lang.NullPointerException
http.Http - at
org.apache.nutch.protocol.Content.getContentType(Content.java:327)
http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:95)
http.Http - at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
http.Http - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
--
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15114911.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Fetch issue with Feeds (SOLVED)
Posted by Vinci <vi...@polyu.edu.hk>.
Hi,
finally I figure out the solution:
go to conf/
rename the old mime-types.xml into anyting else,
then copy tika-mimetypes.xml into the same directory with name
mime-types.xml
the crawler should work now.
in short, this is because 1.0-dev using tika, but old-day mime detection
config file is loaded.
Vinci wrote:
>
> Hi,
>
> Here is the additional information: before the exception appear, nutch
> advertise 2 message:
>
> fetching http://cnn.com
> org.apache.tika.mime.MimeUtils load
> INFO loading [mime-types.xml]
> fetch of http://www.cnn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
>
> Seems mime-type has problem....did I need to config the file it loaded?
>
>
>
> Vinci wrote:
>>
>> Hi All,
>>
>> I get the same exception when I trying with the nightly build for a
>> static page, any one can help?
>>
>>
>> Vicious wrote:
>>>
>>> Hi All,
>>>
>>> Using the latest nightly build I am trying to run a crawl. I have set
>>> the agent property and all relevant plugin. However as soon as I run the
>>> crawl I get the following error in hadoop.log. I read all the post here
>>> and the only suggestion was the http.agent property should not be empty.
>>> Well in my case it isnt and yet I see the error. Any help will be
>>> appreciated.
>>>
>>> Thanks-
>>>
>>> fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed
>>> with: java.lang.NullPointerE
>>> http.Http - java.lang.NullPointerException
>>> http.Http - at
>>> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
>>> http.Http - at
>>> org.apache.nutch.protocol.Content.<init>(Content.java:95)
>>> http.Http - at
>>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>>> http.Http - at
>>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
>>>
>>
>>
>
>
--
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189897.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Fetch issue with Feeds
Posted by Vinci <vi...@polyu.edu.hk>.
Hi,
Here is the additional information: before the exception appear, nutch
advertise 2 message:
fetching http://cnn.com
org.apache.tika.mime.MimeUtils load
INFO loading [mime-types.xml]
fetch of http://www.cnn.com/ failed with: java.lang.NullPointerException
Fetcher: done
Seems mime-type has problem....did I need to config the file it loaded?
Vinci wrote:
>
> Hi All,
>
> I get the same exception when I trying with the nightly build for a static
> page, any one can help?
>
>
> Vicious wrote:
>>
>> Hi All,
>>
>> Using the latest nightly build I am trying to run a crawl. I have set the
>> agent property and all relevant plugin. However as soon as I run the
>> crawl I get the following error in hadoop.log. I read all the post here
>> and the only suggestion was the http.agent property should not be empty.
>> Well in my case it isnt and yet I see the error. Any help will be
>> appreciated.
>>
>> Thanks-
>>
>> fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed with:
>> java.lang.NullPointerE
>> http.Http - java.lang.NullPointerException
>> http.Http - at
>> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
>> http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:95)
>> http.Http - at
>> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>> http.Http - at
>> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
>>
>
>
--
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189590.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Fetch issue with Feeds
Posted by Vinci <vi...@polyu.edu.hk>.
Hi All,
I get the same exception when I trying with the nightly build for a static
page, any one can help?
Vicious wrote:
>
> Hi All,
>
> Using the latest nightly build I am trying to run a crawl. I have set the
> agent property and all relevant plugin. However as soon as I run the crawl
> I get the following error in hadoop.log. I read all the post here and the
> only suggestion was the http.agent property should not be empty. Well in
> my case it isnt and yet I see the error. Any help will be appreciated.
>
> Thanks-
>
> fetcher.Fetcher - fetch of http://feeds.wired.com/CultOfMac failed with:
> java.lang.NullPointerE
> http.Http - java.lang.NullPointerException
> http.Http - at
> org.apache.nutch.protocol.Content.getContentType(Content.java:327)
> http.Http - at org.apache.nutch.protocol.Content.<init>(Content.java:95)
> http.Http - at
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
> http.Http - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:164)
>
--
View this message in context: http://www.nabble.com/Fetch-issue-with-Feeds-tp15114911p15189123.html
Sent from the Nutch - User mailing list archive at Nabble.com.