You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jay <sa...@blastsms.com> on 2010/08/16 16:45:27 UTC

Nutch w Eclipse

Hi,

I have been trying all day to get Nutch going in Eclipse

I am now getting this error:

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 50
Injector: starting at 2010-08-16 21:38:36
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
	at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
	at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

This error happens when I follow this guide:
http://wiki.apache.org/nutch/RunNutchInEclipse1.0
but with 1.2 from the TAG (SVN)

I have tried version: from Trunk, 1.1 from site gz, 1.2 from TAGS on
svn. All have their own errors.

I will try 1.0, but I am hoping to use the latest version of nutch.

Regards,
J

Re: Nutch w Eclipse

Posted by Hannes Carl Meyer <ha...@googlemail.com>.
The .template files are for documentation/example purpose only!
You need nutch-site.xml/crawl-urlfilter.txt and so on.

On Mon, Aug 16, 2010 at 6:50 PM, Jay <sa...@blastsms.com> wrote:

> Thanks so much Hannes.
>
> The issue was quite a silly one (as usual..)
>
> some of the conf files were .template, so I renamed them all and its
> working fine.
>
> Will this effect the ant job when i decide to compile it outside eclipse?
>
> Once again, thanks very much..
>
>
> On Mon, Aug 16, 2010 at 11:40 PM, Hannes Carl Meyer <
> hannescarl@googlemail.com> wrote:
>
>> Hi J,
>>
>> you should check logs/hadoop.log for further error messages!
>>
>> Bests
>>
>> Hannes
>>
>> On Mon, Aug 16, 2010 at 6:37 PM, Jay <sa...@blastsms.com> wrote:
>>
>> > After doing all the steps again, I am now getting this.
>> >
>> > Nutch 1.2
>> >
>> > Getting closer! (I think)
>> >
>> > crawl started in: crawl
>> > rootUrlDir = urls
>> > threads = 10
>> > depth = 3
>> > indexer=lucene
>> > topN = 50
>> > Injector: starting at 2010-08-16 23:19:52
>> > Injector: crawlDb: crawl/crawldb
>> > Injector: urlDir: urls
>> > Injector: Converting injected urls to crawl db entries.
>> > *Skipping
>> http://lucene.apache.org/nutch/:java.lang.NullPointerException*
>> > Injector: Merging injected urls into crawl db.
>> > Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02
>> > Generator: starting at 2010-08-16 23:19:55
>> > Generator: Selecting best-scoring urls due for fetch.
>> > Generator: filtering: true
>> > Generator: normalizing: true
>> > Generator: topN: 50
>> > Generator: jobtracker is 'local', generating exactly one partition.
>> > Generator: 0 records selected for fetching, exiting ...
>> > Stopping at depth=0 - no more URLs to fetch.
>> > No URLs to fetch - check your seed list and URL filters.
>> > crawl finished: crawl
>> >
>> >
>> > I will continue to investigate, but would really appreciate some help ;)
>> >
>> > J
>> >
>>
>
>
>
> --
> Regards,
> James Erwin
> BlastSMS
> Mobile Messaging Service
>
> Integrate SMS technology into YOUR Business.
> Reap the benefits of this non intrusive form of communication.
>
> Ask me how your business can benefit.
>
> The information contained in this message may be confidential and is
> intended to be exclusively for the addressee. Should you receive this
> message unintentionally, please do not use the contents herein and notify
> the sender immediately by return e-mail
>



-- 

https://www.xing.com/profile/HannesCarl_Meyer
http://de.linkedin.com/in/hannescarlmeyer
http://twitter.com/hannescarlmeyer

Re: Nutch w Eclipse

Posted by Jay <sa...@blastsms.com>.
Thanks so much Hannes.

The issue was quite a silly one (as usual..)

some of the conf files were .template, so I renamed them all and its working
fine.

Will this effect the ant job when i decide to compile it outside eclipse?

Once again, thanks very much..

On Mon, Aug 16, 2010 at 11:40 PM, Hannes Carl Meyer <
hannescarl@googlemail.com> wrote:

> Hi J,
>
> you should check logs/hadoop.log for further error messages!
>
> Bests
>
> Hannes
>
> On Mon, Aug 16, 2010 at 6:37 PM, Jay <sa...@blastsms.com> wrote:
>
> > After doing all the steps again, I am now getting this.
> >
> > Nutch 1.2
> >
> > Getting closer! (I think)
> >
> > crawl started in: crawl
> > rootUrlDir = urls
> > threads = 10
> > depth = 3
> > indexer=lucene
> > topN = 50
> > Injector: starting at 2010-08-16 23:19:52
> > Injector: crawlDb: crawl/crawldb
> > Injector: urlDir: urls
> > Injector: Converting injected urls to crawl db entries.
> > *Skipping
> http://lucene.apache.org/nutch/:java.lang.NullPointerException*
> > Injector: Merging injected urls into crawl db.
> > Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02
> > Generator: starting at 2010-08-16 23:19:55
> > Generator: Selecting best-scoring urls due for fetch.
> > Generator: filtering: true
> > Generator: normalizing: true
> > Generator: topN: 50
> > Generator: jobtracker is 'local', generating exactly one partition.
> > Generator: 0 records selected for fetching, exiting ...
> > Stopping at depth=0 - no more URLs to fetch.
> > No URLs to fetch - check your seed list and URL filters.
> > crawl finished: crawl
> >
> >
> > I will continue to investigate, but would really appreciate some help ;)
> >
> > J
> >
>



-- 
Regards,
James Erwin
BlastSMS
Mobile Messaging Service

Integrate SMS technology into YOUR Business.
Reap the benefits of this non intrusive form of communication.

Ask me how your business can benefit.

The information contained in this message may be confidential and is
intended to be exclusively for the addressee. Should you receive this
message unintentionally, please do not use the contents herein and notify
the sender immediately by return e-mail

Re: Nutch w Eclipse

Posted by Hannes Carl Meyer <ha...@googlemail.com>.
Hi J,

you should check logs/hadoop.log for further error messages!

Bests

Hannes

On Mon, Aug 16, 2010 at 6:37 PM, Jay <sa...@blastsms.com> wrote:

> After doing all the steps again, I am now getting this.
>
> Nutch 1.2
>
> Getting closer! (I think)
>
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> indexer=lucene
> topN = 50
> Injector: starting at 2010-08-16 23:19:52
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> *Skipping http://lucene.apache.org/nutch/:java.lang.NullPointerException*
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02
> Generator: starting at 2010-08-16 23:19:55
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: 0 records selected for fetching, exiting ...
> Stopping at depth=0 - no more URLs to fetch.
> No URLs to fetch - check your seed list and URL filters.
> crawl finished: crawl
>
>
> I will continue to investigate, but would really appreciate some help ;)
>
> J
>

Re: Nutch w Eclipse

Posted by Jay <sa...@blastsms.com>.
After doing all the steps again, I am now getting this.

Nutch 1.2

Getting closer! (I think)

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 50
Injector: starting at 2010-08-16 23:19:52
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
*Skipping http://lucene.apache.org/nutch/:java.lang.NullPointerException*
Injector: Merging injected urls into crawl db.
Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02
Generator: starting at 2010-08-16 23:19:55
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl


I will continue to investigate, but would really appreciate some help ;)

J