You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Sami Siren <ss...@gmail.com> on 2009/02/23 07:31:11 UTC

Re: Nutch 1.0 - Setting up and running Nutch for crawling and Solr for indexing and querying.

Tony Wang wrote:
> I don't see that Nutch 1.0 has been released. Where did you download it?
>   
Nutch 1.0 has not been released yet, the community is working to get it 
out as we speak. There are still some issues that needs to be fixed 
before the release can take place. Everybody's involvement in testing 
the current nightly builds and providing documentation patches or wiki 
updates is appreciated.

--
 Sami Siren
> nightly build? thanks
>
> On Fri, Feb 20, 2009 at 6:31 PM, Kham Vo <kv...@mac.com> wrote:
>
>   
>> Hello Nutch 1.0 designers,
>>
>> I successfully installed and set up Nutch 1.0 (build # 722).  Ran bin/nutch
>> crawl urls -dir crawl -depth 3 -topN 50 and it seemed to work, fetching data
>> from specified sites.  No error.  My question is do I need to do anything
>> special in order to get Nutch to post the data to another instance of
>> apache-solr running at http://localhost:8983 for indexing.  I googled for
>> any documentation on how to correctly set up Nutch 1.0 such that nutch is
>> for crawling and solr is for indexing and display.  Nothing so far.
>>
>> Your help is greatly appreciated.
>>
>> Kham
>>
>>     
>
>
>
>

java.lang.NullPointerException

Posted by al...@aim.com.

Hello,

I am using nutch0.9 to index files. However, nutch spends less than 1 sec to fetch those files and gives 

java.lang.NullPointerException. 

As I see from the plugin's code nutch downloads content to a temp file and then parses it. So the problem is that nutch does not download the whole file for some reason. I set http.timeout as
? <value>100000</value> but it did not help.

Thanks for any ideas.
A.