You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chip Calhoun <cc...@aip.org> on 2011/06/20 16:44:13 UTC

Questions about upgrade to Nutch 1.3

Hi everyone,
 
I'm a complete Nutch newbie.  I installed Nutch 1.2 and Solr 1.4.0 on my machine without any trouble.  I've decided to try Nutch 1.3 as it's compatible with Solr 3.1.0, which includes Solritas.  I hope you can help with some problems I'm having.
 
The Nutch documentation still describes a lot of operations happening from $NUTCH_HOME/, but they all apparently need to happen from $NUTCH_HOME/runtime/deploy or $NUTCH_HOME/runtime/local.  Which of these folders should I actually be using?
 
Has NutchBean been deprecated?  If so, how can I run a search and make sure my crawl worked?  I get no results when I try to search using Solr, so I'd like to figure out whether the problem is with my Nutch itself or with my attempt at integrating with Solr.
 
I get an error saying "solrurl is not set".  This seems to be new to Nutch 1.3.  Where do I set this?
 
If you can answer any of these, I'd appreciate it.  Thanks!
 
Chip

Re: Questions about upgrade to Nutch 1.3

Posted by Markus Jelsma <ma...@openindex.io>.
You can safely use 1.3 with Solr 3.1 and Velocity. I've got the stuff up and 
running as well.

On Tuesday 21 June 2011 15:45:53 Chip Calhoun wrote:
> Ahh, thanks again.  Based on your advice, I'm going back to Nutch 1.2 /
> Solr 1.4 and adding the Velocity contrib.  Once I get that working, I'll
> try with Nutch 1.3 again.
> 
> When I try to use Velocity now, I get this message:
> java.lang.RuntimeException: Can't find resource 'velocity.properties' in
> classpath or 'solr/conf/', cwd=C:\apache\apache-solr-1.4.0\exampleThis is
> despite velocity.properties very definitely being in my
> C:\apache\apache-solr-1.4.0\example\solr\conf directory.  But I've veered
> completely into Solr territory now, so I guess that's off-topic.

The properties file is not in 3.1, don't know about 1.4 but don't think do 
either.

> 
> >>> Markus Jelsma <ma...@openindex.io> 6/20/2011 12:43 PM >>>
> 
> On Monday 20 June 2011 18:35:36 Chip Calhoun wrote:
> > Thanks for replying!  I do still have a couple of questions:
> > > Markus Jelsma <ma...@openindex.io> 6/20/2011 11:34 AM >>>
> > > 
> > > > On Monday 20 June 2011 16:44:13 Chip Calhoun wrote:
> > > > Hi everyone,
> > > > 
> > > > I'm a complete Nutch newbie.  I installed Nutch 1.2 and Solr 1.4.0 on
> > > > my machine without any trouble.  I've decided to try Nutch 1.3 as
> > > > it's compatible with Solr 3.1.0, which includes Solritas.  I hope
> > > > you can help with some problems I'm having.
> > > 
> > > Solr 1.4.x has it has Velocity as a contrib.
> > 
> > Does it?  Under 1.4.0 I could never get http://localhost:8983/solr/browse
> > to work.  I thought this was only added later.
> 
> libs must be added manually from contrib but it is shipped.
> 
> > > > I get an error saying "solrurl is not set".  This seems to be new to
> > > > Nutch 1.3.  Where do I set this?
> > > 
> > > According to the source you're using the crawl command.
> > > Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i]
> > > [-topN N]
> > 
> > Thanks, I hadn't known about the solrURL argument at all.  So would a
> > valid usage be: bin/nutch crawl urls -solr http://127.0.0.1:8983 -dir
> > solrcrawl -depth 10 -topN 50 With the new solrURL argument, are there
> > any steps I need to do after my crawl to get my content into Solr?
> 
> I think so but i don't use it. Please try.
> 
> > Thanks!

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Questions about upgrade to Nutch 1.3

Posted by Chip Calhoun <cc...@aip.org>.
Ahh, thanks again.  Based on your advice, I'm going back to Nutch 1.2 / Solr 1.4 and adding the Velocity contrib.  Once I get that working, I'll try with Nutch 1.3 again.
 
When I try to use Velocity now, I get this message:
java.lang.RuntimeException: Can't find resource 'velocity.properties' in classpath or 'solr/conf/', cwd=C:\apache\apache-solr-1.4.0\exampleThis is despite velocity.properties very definitely being in my C:\apache\apache-solr-1.4.0\example\solr\conf directory.  But I've veered completely into Solr territory now, so I guess that's off-topic.

>>> Markus Jelsma <ma...@openindex.io> 6/20/2011 12:43 PM >>>
On Monday 20 June 2011 18:35:36 Chip Calhoun wrote:
> Thanks for replying!  I do still have a couple of questions:
> > Markus Jelsma <ma...@openindex.io> 6/20/2011 11:34 AM >>>
> > 
> > > On Monday 20 June 2011 16:44:13 Chip Calhoun wrote:
> > > Hi everyone,
> > > 
> > > I'm a complete Nutch newbie.  I installed Nutch 1.2 and Solr 1.4.0 on
> > > my machine without any trouble.  I've decided to try Nutch 1.3 as it's
> > > compatible with Solr 3.1.0, which includes Solritas.  I hope you can
> > > help with some problems I'm having.
> > 
> > Solr 1.4.x has it has Velocity as a contrib.
> 
> Does it?  Under 1.4.0 I could never get http://localhost:8983/solr/browse
> to work.  I thought this was only added later.

libs must be added manually from contrib but it is shipped.

> 
> > > I get an error saying "solrurl is not set".  This seems to be new to
> > > Nutch 1.3.  Where do I set this?
> > 
> > According to the source you're using the crawl command.
> > Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i]
> > [-topN N]
> 
> Thanks, I hadn't known about the solrURL argument at all.  So would a valid
> usage be: bin/nutch crawl urls -solr http://127.0.0.1:8983 -dir solrcrawl
> -depth 10 -topN 50 With the new solrURL argument, are there any steps I
> need to do after my crawl to get my content into Solr?

I think so but i don't use it. Please try.

> 
> Thanks!

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Questions about upgrade to Nutch 1.3

Posted by Markus Jelsma <ma...@openindex.io>.

On Monday 20 June 2011 18:35:36 Chip Calhoun wrote:
> Thanks for replying!  I do still have a couple of questions:
> > Markus Jelsma <ma...@openindex.io> 6/20/2011 11:34 AM >>>
> > 
> > > On Monday 20 June 2011 16:44:13 Chip Calhoun wrote:
> > > Hi everyone,
> > > 
> > > I'm a complete Nutch newbie.  I installed Nutch 1.2 and Solr 1.4.0 on
> > > my machine without any trouble.  I've decided to try Nutch 1.3 as it's
> > > compatible with Solr 3.1.0, which includes Solritas.  I hope you can
> > > help with some problems I'm having.
> > 
> > Solr 1.4.x has it has Velocity as a contrib.
> 
> Does it?  Under 1.4.0 I could never get http://localhost:8983/solr/browse
> to work.  I thought this was only added later.

libs must be added manually from contrib but it is shipped.

> 
> > > I get an error saying "solrurl is not set".  This seems to be new to
> > > Nutch 1.3.  Where do I set this?
> > 
> > According to the source you're using the crawl command.
> > Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i]
> > [-topN N]
> 
> Thanks, I hadn't known about the solrURL argument at all.  So would a valid
> usage be: bin/nutch crawl urls -solr http://127.0.0.1:8983 -dir solrcrawl
> -depth 10 -topN 50 With the new solrURL argument, are there any steps I
> need to do after my crawl to get my content into Solr?

I think so but i don't use it. Please try.

> 
> Thanks!

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Questions about upgrade to Nutch 1.3

Posted by Chip Calhoun <cc...@aip.org>.
Thanks for replying!  I do still have a couple of questions:

> Markus Jelsma <ma...@openindex.io> 6/20/2011 11:34 AM >>>
> > On Monday 20 June 2011 16:44:13 Chip Calhoun wrote:
> > Hi everyone,
> > 
> > I'm a complete Nutch newbie.  I installed Nutch 1.2 and Solr 1.4.0 on my
> > machine without any trouble.  I've decided to try Nutch 1.3 as it's
> > compatible with Solr 3.1.0, which includes Solritas.  I hope you can help
> > with some problems I'm having.
> 
> Solr 1.4.x has it has Velocity as a contrib.
Does it?  Under 1.4.0 I could never get http://localhost:8983/solr/browse to work.  I thought this was only added later.

> > I get an error saying "solrurl is not set".  This seems to be new to Nutch
> > 1.3.  Where do I set this?
> 
> According to the source you're using the crawl command. 
> Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i] [-topN N]
 
Thanks, I hadn't known about the solrURL argument at all.  So would a valid usage be:
bin/nutch crawl urls -solr http://127.0.0.1:8983 -dir solrcrawl -depth 10 -topN 50
With the new solrURL argument, are there any steps I need to do after my crawl to get my content into Solr?
 
Thanks!

 

Re: Questions about upgrade to Nutch 1.3

Posted by Markus Jelsma <ma...@openindex.io>.

On Monday 20 June 2011 16:44:13 Chip Calhoun wrote:
> Hi everyone,
> 
> I'm a complete Nutch newbie.  I installed Nutch 1.2 and Solr 1.4.0 on my
> machine without any trouble.  I've decided to try Nutch 1.3 as it's
> compatible with Solr 3.1.0, which includes Solritas.  I hope you can help
> with some problems I'm having.

Solr 1.4.x has it has Velocity as a contrib.

> 
> The Nutch documentation still describes a lot of operations happening from
> $NUTCH_HOME/, but they all apparently need to happen from
> $NUTCH_HOME/runtime/deploy or $NUTCH_HOME/runtime/local.  Which of these
> folders should I actually be using?

Use local if you're not running on Hadoop. You can then consider runtime/local 
as your NUTCH_HOME.

> 
> Has NutchBean been deprecated?  If so, how can I run a search and make sure
> my crawl worked?  I get no results when I try to search using Solr, so I'd
> like to figure out whether the problem is with my Nutch itself or with my
> attempt at integrating with Solr.

NutchBean is for search, which is gone.

> 
> I get an error saying "solrurl is not set".  This seems to be new to Nutch
> 1.3.  Where do I set this?

According to the source you're using the crawl command. 
Usage: Crawl <urlDir> -solr <solrURL> [-dir d] [-threads n] [-depth i] [-topN 
N]

> 
> If you can answer any of these, I'd appreciate it.  Thanks!
> 
> Chip

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350