You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nicolas MARTIN <ni...@gmail.com> on 2009/01/25 14:58:08 UTC
Running Nutch : plugin folder and hadoop configuration
Hi all,
I'm running Nutch v-0.9 under Eclipse GANYMEDE.
I've got the following log (last lines) :
java.lang.IllegalArgumentException: plugin.folders is not defined
at
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
at
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
at
org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
Here is my nutch-default conf file :
<!-- plugin properties -->
<property>
<name>plugin.folders</name>
<value>../src/plugin</value>
<description>Directories where nutch plugins are located. Each
element may be a relative or absolute path. If absolute, it is used
as is. If relative, it is searched for on the classpath.</description>
</property>
Someone has an idea ??
TY all
Re: Running Nutch : plugin folder and hadoop configuration
Posted by Nicolas MARTIN <ni...@gmail.com>.
Ok i m under build/plugins in my nutch-default conf file.
But the problem seems to come from hadoop and i now have :
2009-01-26 10:31:34,171 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 0 time(s).
2009-01-26 10:31:36,093 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 1 time(s).
2009-01-26 10:31:38,093 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 2 time(s).
2009-01-26 10:31:40,109 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 3 time(s).
2009-01-26 10:31:42,125 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 4 time(s).
2009-01-26 10:31:44,140 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 5 time(s).
2009-01-26 10:31:46,046 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 6 time(s).
2009-01-26 10:31:48,062 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 7 time(s).
2009-01-26 10:31:50,078 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 8 time(s).
2009-01-26 10:31:52,078 INFO ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 9 time(s).
Exception in thread "main" java.io.IOException: Call to localhost/
127.0.0.1:9000 failed on local exception: Connection refused: no further
information
at org.apache.hadoop.ipc.Client.call(Client.java:699)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at $Proxy0.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)
Caused by: java.net.ConnectException: Connection refused: no further
information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:685)
... 12 more
Do I have to check again these instructions :
http://hadoop.apache.org/core/docs/current/quickstart.html to configure
hadoop?
Cheers,
2009/1/26 Doğacan Güney <do...@gmail.com>
> 2009/1/26 Nicolas MARTIN <ni...@gmail.com>:
> > Hi,
> >
> > Now my nutch-default file looks like :
> >
> > <!-- plugin properties -->
> >
> > <property>
> > <name>plugin.folders</name>
> > <value>C:/Java/crawler_workspace/nutch/conf/src/plugin</value>
> > <description>Directories where nutch plugins are located. Each
> > element may be a relative or absolute path. If absolute, it is used
> > as is. If relative, it is searched for on the classpath.</description>
> > </property>
> >
>
> This seems to be source code for plugins. You need the folder under build/.
>
> Run "ant" on nutch's home directory and you should get a build
> directory. plugin.folders
> is build/plugins
>
>
> > And i have added the conf folder to my classpath...
> >
> > But always the same error :
> >
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
> >
> >
> > If someone has an idea !
> >
> > TY all
> >
> > 2009/1/25 Doğacan Güney <do...@gmail.com>
> >
> >> On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I'm running Nutch v-0.9 under Eclipse GANYMEDE.
> >> > I've got the following log (last lines) :
> >> >
> >> > java.lang.IllegalArgumentException: plugin.folders is not defined
> >> > at
> >> >
> >>
> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
> >> > at
> >> >
> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
> >> > at
> >> > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
> >> > at
> org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
> >> > at
> >> >
> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
> >> > at
> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >> > at
> >> >
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >> > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> >> > at
> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >> > at
> >> >
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
> >> > at
> >> >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> >> > Exception in thread "main" java.io.IOException: Job failed!
> >> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> >> > at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> >> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
> >> >
> >> > Here is my nutch-default conf file :
> >> >
> >> > <!-- plugin properties -->
> >> >
> >> > <property>
> >> > <name>plugin.folders</name>
> >> > <value>../src/plugin</value>
> >> > <description>Directories where nutch plugins are located. Each
> >> > element may be a relative or absolute path. If absolute, it is used
> >> > as is. If relative, it is searched for on the
> classpath.</description>
> >> > </property>
> >> >
> >> > Someone has an idea ??
> >> >
> >>
> >> Try adding your conf directory to your classpath in run configurations.
> >> Also
> >> I think you will have to make plugin.folders point to an absolute path
> >> (which
> >> is $NUTCH_HOME/build/plugins not src/plugin).
> >>
> >> > TY all
> >> >
> >>
> >>
> >>
> >> --
> >> Doğacan Güney
> >>
> >
>
>
>
> --
> Doğacan Güney
>
Re: Running Nutch : plugin folder and hadoop configuration
Posted by Doğacan Güney <do...@gmail.com>.
2009/1/26 Nicolas MARTIN <ni...@gmail.com>:
> Hi,
>
> Now my nutch-default file looks like :
>
> <!-- plugin properties -->
>
> <property>
> <name>plugin.folders</name>
> <value>C:/Java/crawler_workspace/nutch/conf/src/plugin</value>
> <description>Directories where nutch plugins are located. Each
> element may be a relative or absolute path. If absolute, it is used
> as is. If relative, it is searched for on the classpath.</description>
> </property>
>
This seems to be source code for plugins. You need the folder under build/.
Run "ant" on nutch's home directory and you should get a build
directory. plugin.folders
is build/plugins
> And i have added the conf folder to my classpath...
>
> But always the same error :
>
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
>
>
> If someone has an idea !
>
> TY all
>
> 2009/1/25 Doğacan Güney <do...@gmail.com>
>
>> On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'm running Nutch v-0.9 under Eclipse GANYMEDE.
>> > I've got the following log (last lines) :
>> >
>> > java.lang.IllegalArgumentException: plugin.folders is not defined
>> > at
>> >
>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
>> > at
>> > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
>> > at
>> > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
>> > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
>> > at
>> > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
>> > at
>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>> > at
>> >
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>> > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> > at
>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>> > at
>> >
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
>> > at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>> > Exception in thread "main" java.io.IOException: Job failed!
>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
>> > at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
>> >
>> > Here is my nutch-default conf file :
>> >
>> > <!-- plugin properties -->
>> >
>> > <property>
>> > <name>plugin.folders</name>
>> > <value>../src/plugin</value>
>> > <description>Directories where nutch plugins are located. Each
>> > element may be a relative or absolute path. If absolute, it is used
>> > as is. If relative, it is searched for on the classpath.</description>
>> > </property>
>> >
>> > Someone has an idea ??
>> >
>>
>> Try adding your conf directory to your classpath in run configurations.
>> Also
>> I think you will have to make plugin.folders point to an absolute path
>> (which
>> is $NUTCH_HOME/build/plugins not src/plugin).
>>
>> > TY all
>> >
>>
>>
>>
>> --
>> Doğacan Güney
>>
>
--
Doğacan Güney
Re: Running Nutch : plugin folder and hadoop configuration
Posted by Nicolas MARTIN <ni...@gmail.com>.
Hi,
Now my nutch-default file looks like :
<!-- plugin properties -->
<property>
<name>plugin.folders</name>
<value>C:/Java/crawler_workspace/nutch/conf/src/plugin</value>
<description>Directories where nutch plugins are located. Each
element may be a relative or absolute path. If absolute, it is used
as is. If relative, it is searched for on the classpath.</description>
</property>
And i have added the conf folder to my classpath...
But always the same error :
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
If someone has an idea !
TY all
2009/1/25 Doğacan Güney <do...@gmail.com>
> On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com>
> wrote:
> > Hi all,
> >
> > I'm running Nutch v-0.9 under Eclipse GANYMEDE.
> > I've got the following log (last lines) :
> >
> > java.lang.IllegalArgumentException: plugin.folders is not defined
> > at
> >
> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
> > at
> > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
> > at
> > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
> > at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
> > at
> > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
> > at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> > at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> > at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> > at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
> > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
> >
> > Here is my nutch-default conf file :
> >
> > <!-- plugin properties -->
> >
> > <property>
> > <name>plugin.folders</name>
> > <value>../src/plugin</value>
> > <description>Directories where nutch plugins are located. Each
> > element may be a relative or absolute path. If absolute, it is used
> > as is. If relative, it is searched for on the classpath.</description>
> > </property>
> >
> > Someone has an idea ??
> >
>
> Try adding your conf directory to your classpath in run configurations.
> Also
> I think you will have to make plugin.folders point to an absolute path
> (which
> is $NUTCH_HOME/build/plugins not src/plugin).
>
> > TY all
> >
>
>
>
> --
> Doğacan Güney
>
Re: Running Nutch : plugin folder and hadoop configuration
Posted by Doğacan Güney <do...@gmail.com>.
On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com> wrote:
> Hi all,
>
> I'm running Nutch v-0.9 under Eclipse GANYMEDE.
> I've got the following log (last lines) :
>
> java.lang.IllegalArgumentException: plugin.folders is not defined
> at
> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
> at
> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
> at
> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
> at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
> at
> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
>
> Here is my nutch-default conf file :
>
> <!-- plugin properties -->
>
> <property>
> <name>plugin.folders</name>
> <value>../src/plugin</value>
> <description>Directories where nutch plugins are located. Each
> element may be a relative or absolute path. If absolute, it is used
> as is. If relative, it is searched for on the classpath.</description>
> </property>
>
> Someone has an idea ??
>
Try adding your conf directory to your classpath in run configurations. Also
I think you will have to make plugin.folders point to an absolute path (which
is $NUTCH_HOME/build/plugins not src/plugin).
> TY all
>
--
Doğacan Güney