You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kham Vo <kv...@mac.com> on 2009/02/21 02:31:32 UTC
Nutch 1.0 - Setting up and running Nutch for crawling and Solr for
indexing and querying.
Hello Nutch 1.0 designers,
I successfully installed and set up Nutch 1.0 (build # 722). Ran bin/
nutch crawl urls -dir crawl -depth 3 -topN 50 and it seemed to work,
fetching data from specified sites. No error. My question is do I
need to do anything special in order to get Nutch to post the data to
another instance of apache-solr running at http://localhost:8983 for
indexing. I googled for any documentation on how to correctly set up
Nutch 1.0 such that nutch is for crawling and solr is for indexing and
display. Nothing so far.
Your help is greatly appreciated.
Kham
java.lang.NullPointerException
Posted by al...@aim.com.
Hello,
I am using nutch0.9 to index files. However, nutch spends less than 1 sec to fetch those files and gives
java.lang.NullPointerException.
As I see from the plugin's code nutch downloads content to a temp file and then parses it. So the problem is that nutch does not download the whole file for some reason. I set http.timeout as
? <value>100000</value> but it did not help.
Thanks for any ideas.
A.
Re: Nutch 1.0 - Setting up and running Nutch for crawling and Solr
for indexing and querying.
Posted by Sami Siren <ss...@gmail.com>.
Tony Wang wrote:
> I don't see that Nutch 1.0 has been released. Where did you download it?
>
Nutch 1.0 has not been released yet, the community is working to get it
out as we speak. There are still some issues that needs to be fixed
before the release can take place. Everybody's involvement in testing
the current nightly builds and providing documentation patches or wiki
updates is appreciated.
--
Sami Siren
> nightly build? thanks
>
> On Fri, Feb 20, 2009 at 6:31 PM, Kham Vo <kv...@mac.com> wrote:
>
>
>> Hello Nutch 1.0 designers,
>>
>> I successfully installed and set up Nutch 1.0 (build # 722). Ran bin/nutch
>> crawl urls -dir crawl -depth 3 -topN 50 and it seemed to work, fetching data
>> from specified sites. No error. My question is do I need to do anything
>> special in order to get Nutch to post the data to another instance of
>> apache-solr running at http://localhost:8983 for indexing. I googled for
>> any documentation on how to correctly set up Nutch 1.0 such that nutch is
>> for crawling and solr is for indexing and display. Nothing so far.
>>
>> Your help is greatly appreciated.
>>
>> Kham
>>
>>
>
>
>
>
Re: Nutch 1.0 - Setting up and running Nutch for crawling and Solr
for indexing and querying.
Posted by Tony Wang <iv...@gmail.com>.
I don't see that Nutch 1.0 has been released. Where did you download it?
nightly build? thanks
On Fri, Feb 20, 2009 at 6:31 PM, Kham Vo <kv...@mac.com> wrote:
> Hello Nutch 1.0 designers,
>
> I successfully installed and set up Nutch 1.0 (build # 722). Ran bin/nutch
> crawl urls -dir crawl -depth 3 -topN 50 and it seemed to work, fetching data
> from specified sites. No error. My question is do I need to do anything
> special in order to get Nutch to post the data to another instance of
> apache-solr running at http://localhost:8983 for indexing. I googled for
> any documentation on how to correctly set up Nutch 1.0 such that nutch is
> for crawling and solr is for indexing and display. Nothing so far.
>
> Your help is greatly appreciated.
>
> Kham
>
--
Are you RCholic? www.RCholic.com
温 良 恭 俭 让 仁 义 礼 智 信