You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Nicolas MARTIN <ni...@gmail.com> on 2009/01/25 14:58:08 UTC

Running Nutch : plugin folder and hadoop configuration

Hi all,

I'm running Nutch v-0.9 under Eclipse GANYMEDE.
I've got the following log (last lines)  :

java.lang.IllegalArgumentException: plugin.folders is not defined
    at
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
    at
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
    at
org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
    at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)

Here is my nutch-default conf file :

<!-- plugin properties -->

<property>
  <name>plugin.folders</name>
  <value>../src/plugin</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

Someone has an idea ??

TY all

Re: Running Nutch : plugin folder and hadoop configuration

Posted by Nicolas MARTIN <ni...@gmail.com>.
Ok i m under build/plugins in my nutch-default conf file.
But the problem seems to come from hadoop and i now have :

2009-01-26 10:31:34,171 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 0 time(s).
2009-01-26 10:31:36,093 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 1 time(s).
2009-01-26 10:31:38,093 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 2 time(s).
2009-01-26 10:31:40,109 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 3 time(s).
2009-01-26 10:31:42,125 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 4 time(s).
2009-01-26 10:31:44,140 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 5 time(s).
2009-01-26 10:31:46,046 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 6 time(s).
2009-01-26 10:31:48,062 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 7 time(s).
2009-01-26 10:31:50,078 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 8 time(s).
2009-01-26 10:31:52,078 INFO  ipc.Client
(Client.java:handleConnectionFailure(364)) - Retrying connect to server:
localhost/127.0.0.1:9000. Already tried 9 time(s).
Exception in thread "main" java.io.IOException: Call to localhost/
127.0.0.1:9000 failed on local exception: Connection refused: no further
information
    at org.apache.hadoop.ipc.Client.call(Client.java:699)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
    at
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:74)
    at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1367)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:83)
Caused by: java.net.ConnectException: Connection refused: no further
information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
    at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
    at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:299)
    at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:772)
    at org.apache.hadoop.ipc.Client.call(Client.java:685)
    ... 12 more

Do I have to check again these instructions :
http://hadoop.apache.org/core/docs/current/quickstart.html to configure
hadoop?

Cheers,

2009/1/26 Doğacan Güney <do...@gmail.com>

> 2009/1/26 Nicolas MARTIN <ni...@gmail.com>:
> > Hi,
> >
> > Now my nutch-default file looks like :
> >
> > <!-- plugin properties -->
> >
> > <property>
> >  <name>plugin.folders</name>
> >  <value>C:/Java/crawler_workspace/nutch/conf/src/plugin</value>
> >  <description>Directories where nutch plugins are located.  Each
> >  element may be a relative or absolute path.  If absolute, it is used
> >  as is.  If relative, it is searched for on the classpath.</description>
> > </property>
> >
>
> This seems to be source code for plugins. You need the folder under build/.
>
> Run "ant" on nutch's home directory and you should get a build
> directory. plugin.folders
> is build/plugins
>
>
> > And i have added the conf folder to my classpath...
> >
> > But always the same error :
> >
> > Exception in thread "main" java.io.IOException: Job failed!
> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> >    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> >    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
> >
> >
> > If someone has an idea !
> >
> > TY all
> >
> > 2009/1/25 Doğacan Güney <do...@gmail.com>
> >
> >> On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com>
> >> wrote:
> >> > Hi all,
> >> >
> >> > I'm running Nutch v-0.9 under Eclipse GANYMEDE.
> >> > I've got the following log (last lines)  :
> >> >
> >> > java.lang.IllegalArgumentException: plugin.folders is not defined
> >> >    at
> >> >
> >>
> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
> >> >    at
> >> >
> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
> >> >    at
> >> > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
> >> >    at
> org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
> >> >    at
> >> >
> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
> >> >    at
> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >> >    at
> >> >
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >> >    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> >> >    at
> >> >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >> >    at
> >> >
> >>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
> >> >    at
> >> >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> >> > Exception in thread "main" java.io.IOException: Job failed!
> >> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> >> >    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> >> >    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
> >> >
> >> > Here is my nutch-default conf file :
> >> >
> >> > <!-- plugin properties -->
> >> >
> >> > <property>
> >> >  <name>plugin.folders</name>
> >> >  <value>../src/plugin</value>
> >> >  <description>Directories where nutch plugins are located.  Each
> >> >  element may be a relative or absolute path.  If absolute, it is used
> >> >  as is.  If relative, it is searched for on the
> classpath.</description>
> >> > </property>
> >> >
> >> > Someone has an idea ??
> >> >
> >>
> >> Try adding your conf directory to your classpath in run configurations.
> >> Also
> >> I think you will have to make plugin.folders point to an absolute path
> >> (which
> >> is $NUTCH_HOME/build/plugins not src/plugin).
> >>
> >> > TY all
> >> >
> >>
> >>
> >>
> >> --
> >> Doğacan Güney
> >>
> >
>
>
>
> --
> Doğacan Güney
>

Re: Running Nutch : plugin folder and hadoop configuration

Posted by Doğacan Güney <do...@gmail.com>.
2009/1/26 Nicolas MARTIN <ni...@gmail.com>:
> Hi,
>
> Now my nutch-default file looks like :
>
> <!-- plugin properties -->
>
> <property>
>  <name>plugin.folders</name>
>  <value>C:/Java/crawler_workspace/nutch/conf/src/plugin</value>
>  <description>Directories where nutch plugins are located.  Each
>  element may be a relative or absolute path.  If absolute, it is used
>  as is.  If relative, it is searched for on the classpath.</description>
> </property>
>

This seems to be source code for plugins. You need the folder under build/.

Run "ant" on nutch's home directory and you should get a build
directory. plugin.folders
is build/plugins


> And i have added the conf folder to my classpath...
>
> But always the same error :
>
> Exception in thread "main" java.io.IOException: Job failed!
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
>    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
>
>
> If someone has an idea !
>
> TY all
>
> 2009/1/25 Doğacan Güney <do...@gmail.com>
>
>> On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'm running Nutch v-0.9 under Eclipse GANYMEDE.
>> > I've got the following log (last lines)  :
>> >
>> > java.lang.IllegalArgumentException: plugin.folders is not defined
>> >    at
>> >
>> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
>> >    at
>> > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
>> >    at
>> > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
>> >    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
>> >    at
>> > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
>> >    at
>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>> >    at
>> >
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>> >    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>> >    at
>> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>> >    at
>> >
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
>> >    at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
>> > Exception in thread "main" java.io.IOException: Job failed!
>> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
>> >    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>> >    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
>> >
>> > Here is my nutch-default conf file :
>> >
>> > <!-- plugin properties -->
>> >
>> > <property>
>> >  <name>plugin.folders</name>
>> >  <value>../src/plugin</value>
>> >  <description>Directories where nutch plugins are located.  Each
>> >  element may be a relative or absolute path.  If absolute, it is used
>> >  as is.  If relative, it is searched for on the classpath.</description>
>> > </property>
>> >
>> > Someone has an idea ??
>> >
>>
>> Try adding your conf directory to your classpath in run configurations.
>> Also
>> I think you will have to make plugin.folders point to an absolute path
>> (which
>> is $NUTCH_HOME/build/plugins not src/plugin).
>>
>> > TY all
>> >
>>
>>
>>
>> --
>> Doğacan Güney
>>
>



-- 
Doğacan Güney

Re: Running Nutch : plugin folder and hadoop configuration

Posted by Nicolas MARTIN <ni...@gmail.com>.
Hi,

Now my nutch-default file looks like :

<!-- plugin properties -->

<property>
  <name>plugin.folders</name>
  <value>C:/Java/crawler_workspace/nutch/conf/src/plugin</value>
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

And i have added the conf folder to my classpath...

But always the same error :

Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)


If someone has an idea !

TY all

2009/1/25 Doğacan Güney <do...@gmail.com>

> On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com>
> wrote:
> > Hi all,
> >
> > I'm running Nutch v-0.9 under Eclipse GANYMEDE.
> > I've got the following log (last lines)  :
> >
> > java.lang.IllegalArgumentException: plugin.folders is not defined
> >    at
> >
> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
> >    at
> > org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
> >    at
> > org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
> >    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
> >    at
> > org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
> >    at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >    at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
> >    at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
> >    at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
> >    at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> > Exception in thread "main" java.io.IOException: Job failed!
> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
> >    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
> >    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
> >
> > Here is my nutch-default conf file :
> >
> > <!-- plugin properties -->
> >
> > <property>
> >  <name>plugin.folders</name>
> >  <value>../src/plugin</value>
> >  <description>Directories where nutch plugins are located.  Each
> >  element may be a relative or absolute path.  If absolute, it is used
> >  as is.  If relative, it is searched for on the classpath.</description>
> > </property>
> >
> > Someone has an idea ??
> >
>
> Try adding your conf directory to your classpath in run configurations.
> Also
> I think you will have to make plugin.folders point to an absolute path
> (which
> is $NUTCH_HOME/build/plugins not src/plugin).
>
> > TY all
> >
>
>
>
> --
> Doğacan Güney
>

Re: Running Nutch : plugin folder and hadoop configuration

Posted by Doğacan Güney <do...@gmail.com>.
On Sun, Jan 25, 2009 at 3:58 PM, Nicolas MARTIN <ni...@gmail.com> wrote:
> Hi all,
>
> I'm running Nutch v-0.9 under Eclipse GANYMEDE.
> I've got the following log (last lines)  :
>
> java.lang.IllegalArgumentException: plugin.folders is not defined
>    at
> org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:78)
>    at
> org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:72)
>    at
> org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:95)
>    at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
>    at
> org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:57)
>    at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>    at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
>    at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138)
> Exception in thread "main" java.io.IOException: Job failed!
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217)
>    at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
>
> Here is my nutch-default conf file :
>
> <!-- plugin properties -->
>
> <property>
>  <name>plugin.folders</name>
>  <value>../src/plugin</value>
>  <description>Directories where nutch plugins are located.  Each
>  element may be a relative or absolute path.  If absolute, it is used
>  as is.  If relative, it is searched for on the classpath.</description>
> </property>
>
> Someone has an idea ??
>

Try adding your conf directory to your classpath in run configurations. Also
I think you will have to make plugin.folders point to an absolute path (which
is $NUTCH_HOME/build/plugins not src/plugin).

> TY all
>



-- 
Doğacan Güney