You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Berlin Brown <be...@gmail.com> on 2007/06/02 09:20:36 UTC
Error with the inject command
I am getting this error when I am trying to run the inject:
I have done this:
mkdir dmoz
bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
5000 > dmoz/urls
And an error here:
bin/nutch inject crawl/crawldb dmoz
2007-06-02 02:37:19,796 WARN plugin.PluginRepository - Plugins: not a
file: url. Can't load plugins from:
jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch/nutch-0.9.job!/plugins
2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered Plugins:
2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered
Extension-Points:
2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
2007-06-02 02:37:19,812 WARN mapred.LocalJobRunner - job_5ysi6h
java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
not found.
at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
--
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies
Re: Error with the inject command
Posted by Mathijs Homminga <ma...@knowlogy.nl>.
It looks like the plugins cannot be found
I'm getting the same errors as you when I delete the plugins/
subdirectory. Is yours still there?
Berlin Brown wrote:
> Anybody? Still cant figure it out. I even created the crawl/crawldb
> directory. Nothing. The URLS are just a set of URLS. I am using
> 0.9.1 is it a bug maybe?
>
> On 6/2/07, Berlin Brown <be...@gmail.com> wrote:
>> I am getting this error when I am trying to run the inject:
>> I have done this:
>>
>> mkdir dmoz
>> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
>> 5000 > dmoz/urls
>>
>> And an error here:
>>
>> bin/nutch inject crawl/crawldb dmoz
>>
>>
>> 2007-06-02 02:37:19,796 WARN plugin.PluginRepository - Plugins: not a
>> file: url. Can't load plugins from:
>> jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch/nutch-0.9.job!/plugins
>>
>> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Plugin
>> Auto-activation mode: [true]
>> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered
>> Plugins:
>> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
>> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered
>> Extension-Points:
>> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
>> 2007-06-02 02:37:19,812 WARN mapred.LocalJobRunner - job_5ysi6h
>> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
>> not found.
>> at
>> org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
>> at
>> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>>
>> --
>> Berlin Brown
>> http://www.newspiritcompany.com - newspirit technologies
>>
>
>
--
Knowlogy
Helperpark 290 C
9723 ZA Groningen
mathijs.homminga@knowlogy.nl
+31 (0)6 15312977
http://www.knowlogy.nl
RE: Error with the inject command
Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Berlin,
Nutch needs a file called urls.txt inside the directory that you are
passing to the inject command. Try renaming the urls file to urls.txt.
Also, are you using the local FS or hadoop dfs? If it's the latter, you'll
have to put your dmoz directory on the hadoop fs.
-vishal.
-----Original Message-----
From: Berlin Brown [mailto:berlin.brown@gmail.com]
Sent: Sunday, June 03, 2007 5:41 AM
To: nutch-user@lucene.apache.org
Subject: Re: Error with the inject command
Anybody? Still cant figure it out. I even created the crawl/crawldb
directory. Nothing. The URLS are just a set of URLS. I am using
0.9.1 is it a bug maybe?
On 6/2/07, Berlin Brown <be...@gmail.com> wrote:
> I am getting this error when I am trying to run the inject:
> I have done this:
>
> mkdir dmoz
> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
> 5000 > dmoz/urls
>
> And an error here:
>
> bin/nutch inject crawl/crawldb dmoz
>
>
> 2007-06-02 02:37:19,796 WARN plugin.PluginRepository - Plugins: not a
> file: url. Can't load plugins from:
>
jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch
/nutch-0.9.job!/plugins
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered
Plugins:
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered
> Extension-Points:
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
> 2007-06-02 02:37:19,812 WARN mapred.LocalJobRunner - job_5ysi6h
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> not found.
> at
org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
> at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>
> --
> Berlin Brown
> http://www.newspiritcompany.com - newspirit technologies
>
--
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies
Re: Error with the inject command
Posted by Berlin Brown <be...@gmail.com>.
Anybody? Still cant figure it out. I even created the crawl/crawldb
directory. Nothing. The URLS are just a set of URLS. I am using
0.9.1 is it a bug maybe?
On 6/2/07, Berlin Brown <be...@gmail.com> wrote:
> I am getting this error when I am trying to run the inject:
> I have done this:
>
> mkdir dmoz
> bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset
> 5000 > dmoz/urls
>
> And an error here:
>
> bin/nutch inject crawl/crawldb dmoz
>
>
> 2007-06-02 02:37:19,796 WARN plugin.PluginRepository - Plugins: not a
> file: url. Can't load plugins from:
> jar:file:/C:/Berlin/Downloads4/workspaceTrunk/BotListProjects/botcrawl/nutch/nutch-0.9.job!/plugins
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered Plugins:
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - Registered
> Extension-Points:
> 2007-06-02 02:37:19,812 INFO plugin.PluginRepository - NONE
> 2007-06-02 02:37:19,812 WARN mapred.LocalJobRunner - job_5ysi6h
> java.lang.RuntimeException: x point org.apache.nutch.net.URLNormalizer
> not found.
> at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:120)
> at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injecto
>
> --
> Berlin Brown
> http://www.newspiritcompany.com - newspirit technologies
>
--
Berlin Brown
http://www.newspiritcompany.com - newspirit technologies