You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Berlin Brown <be...@gmail.com> on 2007/06/02 09:20:36 UTC

Error with the inject command

I am getting this error when I am trying to run the inject:
I have done this:

mkdir dmoz
bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
5000 > dmoz/urls

And an error here:

bin/nutch inject crawl/crawldb dmoz


2007-06-02 02:37:19,796 WARN  plugin.PluginRepository - Plugins: not a
file: url. Can't load plugins from:
jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch/nutch-0.9.job!/plugins
2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered Plugins:
2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - 	NONE
2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
Extension-Points:
2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - 	NONE
2007-06-02 02:37:19,812 WARN  mapred.LocalJobRunner - job_5ysi6h
java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
not found.
	at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
	at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto

-- 
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies

Re: Error with the inject command

Posted by Mathijs Homminga <ma...@knowlogy.nl>.
It looks like the plugins cannot be found
I'm getting the same errors as you when I delete the plugins/ 
subdirectory. Is yours still there?

Berlin Brown wrote:
> Anybody?  Still cant figure it out.  I even created the crawl/crawldb
> directory.  Nothing.  The URLS are just a set of URLS.  I am using
> 0.9.1  is it a bug maybe?
>
> On 6/2/07, Berlin Brown <be...@gmail.com> wrote:
>> I am getting this error when I am trying to run the inject:
>> I have done this:
>>
>> mkdir dmoz
>> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
>> 5000 > dmoz/urls
>>
>> And an error here:
>>
>> bin/nutch inject crawl/crawldb dmoz
>>
>>
>> 2007-06-02 02:37:19,796 WARN  plugin.PluginRepository - Plugins: not a
>> file: url. Can't load plugins from:
>> jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch/nutch-0.9.job!/plugins 
>>
>> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Plugin
>> Auto-activation mode: [true]
>> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered 
>> Plugins:
>> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
>> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
>> Extension-Points:
>> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
>> 2007-06-02 02:37:19,812 WARN  mapred.LocalJobRunner - job_5ysi6h
>> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
>> not found.
>>         at 
>> org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
>>         at 
>> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>>
>> -- 
>> Berlin Brown
>> http://www.newspiritcompany.com - newspirit technologies
>>
>
>


-- 
Knowlogy
Helperpark 290 C
9723 ZA Groningen

mathijs.homminga@knowlogy.nl
+31 (0)6 15312977
http://www.knowlogy.nl



RE: Error with the inject command

Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Berlin,

   Nutch needs a file called urls.txt inside the directory that you are
passing to the inject command. Try renaming the urls file to urls.txt.

  Also, are you using the local FS or hadoop dfs? If it's the latter, you'll
have to put your dmoz directory on the hadoop fs.

-vishal.

-----Original Message-----
From: Berlin Brown [mailto:berlin.brown@gmail.com] 
Sent: Sunday, June 03, 2007 5:41 AM
To: nutch-user@lucene.apache.org
Subject: Re: Error with the inject command

Anybody?  Still cant figure it out.  I even created the crawl/crawldb
directory.  Nothing.  The URLS are just a set of URLS.  I am using
0.9.1  is it a bug maybe?

On 6/2/07, Berlin Brown <be...@gmail.com> wrote:
> I am getting this error when I am trying to run the inject:
> I have done this:
>
> mkdir dmoz
> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
> 5000 > dmoz/urls
>
> And an error here:
>
> bin/nutch inject crawl/crawldb dmoz
>
>
> 2007-06-02 02:37:19,796 WARN  plugin.PluginRepository - Plugins: not a
> file: url. Can't load plugins from:
>
jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch
/nutch-0.9.job!/plugins
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
Plugins:
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
> 2007-06-02 02:37:19,812 WARN  mapred.LocalJobRunner - job_5ysi6h
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> not found.
>         at
org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
>         at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>
> --
> Berlin Brown
> http://www.newspiritcompany.com - newspirit technologies
>


-- 
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies


Re: Error with the inject command

Posted by Berlin Brown <be...@gmail.com>.
Anybody?  Still cant figure it out.  I even created the crawl/crawldb
directory.  Nothing.  The URLS are just a set of URLS.  I am using
0.9.1  is it a bug maybe?

On 6/2/07, Berlin Brown <be...@gmail.com> wrote:
> I am getting this error when I am trying to run the inject:
> I have done this:
>
> mkdir dmoz
> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
> 5000 > dmoz/urls
>
> And an error here:
>
> bin/nutch inject crawl/crawldb dmoz
>
>
> 2007-06-02 02:37:19,796 WARN  plugin.PluginRepository - Plugins: not a
> file: url. Can't load plugins from:
> jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch/nutch-0.9.job!/plugins
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2007-06-02 02:37:19,812 INFO  plugin.PluginRepository -         NONE
> 2007-06-02 02:37:19,812 WARN  mapred.LocalJobRunner - job_5ysi6h
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> not found.
>         at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
>         at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>
> --
> Berlin Brown
> http://www.newspiritcompany.com - newspirit technologies
>


-- 
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies