You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Ferdy Galema <fe...@kalooga.com> on 2011/08/24 17:09:08 UTC

how to use Nutch 1.3 as a single job jar on newer Hadoop releases

Hi,

Compiling Nutch 1.3 with patch NUTCH-993 (newest patch) and configuring 
"mapreduce.job.jar.unpack.pattern" and "plugin.folders" according to 
issue NUTCH-937 still won't allow me to run the stand-alone job jar. 
What else should I patch/configure in order to do so? The command I use 
is "hadoop jar nutch-1.3.job org.apache.nutch.crawl.Crawl /urls -dir 
/root". This results in the stacktrace below. (I guess a plugin folder 
is referenced that does not exist.)

11/08/24 17:05:37 INFO mapred.JobClient: Task Id : 
attempt_201108231047_0009_m_000000_0, Status : FAILED
java.lang.RuntimeException: Error in configuring object
     at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
     at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
     at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:396)
     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
     at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
     ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
     at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
     at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
     at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
     ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
     at java.lang.reflect.Method.invoke(Method.java:597)
     at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
     ... 17 more
Caused by: java.lang.NullPointerException
     at 
org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:87)
     at 
org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71)
     at 
org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
     at org.apache.nutch.net.URLNormalizers.<init>(URLNormalizers.java:117)
     at 
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
     ... 22 more



Re: how to use Nutch 1.3 as a single job jar on newer Hadoop releases

Posted by Ferdy Galema <fe...@kalooga.com>.
Just to avoid confusion, I meant CDH3u0.

On 08/25/2011 10:41 AM, Ferdy Galema wrote:
> Thanks for the quick response. I actually did recompile the jar so the 
> problem was of a different nature. What seemed to be the problem was 
> the fact that "mapreduce.job.jar.unpack.pattern" is not an Hadoop 
> client property. I can only be used as a deployment property, in other 
> words you have to specify it on the Hadoop cluster itself. (At least 
> as of CDH4u0). I will also update issue NUTCH-937 with this information.
>
> On 08/24/2011 05:25 PM, Julien Nioche wrote:
>> Make sure you specify the params in runtime/deploy/conf unless you 
>> rebuild the job file with 'ant job'
>>
>> On 24 August 2011 16:09, Ferdy Galema <ferdy.galema@kalooga.com 
>> <ma...@kalooga.com>> wrote:
>>
>>     Hi,
>>
>>     Compiling Nutch 1.3 with patch NUTCH-993 (newest patch) and
>>     configuring "mapreduce.job.jar.unpack.pattern" and
>>     "plugin.folders" according to issue NUTCH-937 still won't allow
>>     me to run the stand-alone job jar. What else should I
>>     patch/configure in order to do so? The command I use is "hadoop
>>     jar nutch-1.3.job org.apache.nutch.crawl.Crawl /urls -dir /root".
>>     This results in the stacktrace below. (I guess a plugin folder is
>>     referenced that does not exist.)
>>
>>     11/08/24 17:05:37 INFO mapred.JobClient: Task Id :
>>     attempt_201108231047_0009_m_000000_0, Status : FAILED
>>     java.lang.RuntimeException: Error in configuring object
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>>        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at
>>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>        at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>     Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>        ... 9 more
>>     Caused by: java.lang.RuntimeException: Error in configuring object
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>>        ... 14 more
>>     Caused by: java.lang.reflect.InvocationTargetException
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>>        ... 17 more
>>     Caused by: java.lang.NullPointerException
>>        at
>>     org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:87)
>>        at
>>     org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71)
>>        at
>>     org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
>>        at org.apache.nutch.net
>>     <http://org.apache.nutch.net>.URLNormalizers.<init>(URLNormalizers.java:117)
>>        at
>>     org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
>>        ... 22 more
>>
>>
>>
>>
>>
>> -- 
>> *
>> *Open Source Solutions for Text Engineering
>>
>> http://digitalpebble.blogspot.com/
>> http://www.digitalpebble.com

Re: how to use Nutch 1.3 as a single job jar on newer Hadoop releases

Posted by Ferdy Galema <fe...@kalooga.com>.
Thanks for the quick response. I actually did recompile the jar so the 
problem was of a different nature. What seemed to be the problem was the 
fact that "mapreduce.job.jar.unpack.pattern" is not an Hadoop client 
property. I can only be used as a deployment property, in other words 
you have to specify it on the Hadoop cluster itself. (At least as of 
CDH4u0). I will also update issue NUTCH-937 with this information.

On 08/24/2011 05:25 PM, Julien Nioche wrote:
> Make sure you specify the params in runtime/deploy/conf unless you 
> rebuild the job file with 'ant job'
>
> On 24 August 2011 16:09, Ferdy Galema <ferdy.galema@kalooga.com 
> <ma...@kalooga.com>> wrote:
>
>     Hi,
>
>     Compiling Nutch 1.3 with patch NUTCH-993 (newest patch) and
>     configuring "mapreduce.job.jar.unpack.pattern" and
>     "plugin.folders" according to issue NUTCH-937 still won't allow me
>     to run the stand-alone job jar. What else should I patch/configure
>     in order to do so? The command I use is "hadoop jar nutch-1.3.job
>     org.apache.nutch.crawl.Crawl /urls -dir /root". This results in
>     the stacktrace below. (I guess a plugin folder is referenced that
>     does not exist.)
>
>     11/08/24 17:05:37 INFO mapred.JobClient: Task Id :
>     attempt_201108231047_0009_m_000000_0, Status : FAILED
>     java.lang.RuntimeException: Error in configuring object
>        at
>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>        at
>     org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>        at
>     org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.mapred.Child.main(Child.java:262)
>     Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>        ... 9 more
>     Caused by: java.lang.RuntimeException: Error in configuring object
>        at
>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>        at
>     org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>        at
>     org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>        ... 14 more
>     Caused by: java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>        ... 17 more
>     Caused by: java.lang.NullPointerException
>        at
>     org.apache.nutch.plugin.PluginManifestParser.parsePluginFolder(PluginManifestParser.java:87)
>        at
>     org.apache.nutch.plugin.PluginRepository.<init>(PluginRepository.java:71)
>        at
>     org.apache.nutch.plugin.PluginRepository.get(PluginRepository.java:99)
>        at org.apache.nutch.net
>     <http://org.apache.nutch.net>.URLNormalizers.<init>(URLNormalizers.java:117)
>        at
>     org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:70)
>        ... 22 more
>
>
>
>
>
> -- 
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com

Re: how to use Nutch 1.3 as a single job jar on newer Hadoop releases

Posted by Julien Nioche <li...@gmail.com>.
Make sure you specify the params in runtime/deploy/conf unless you rebuild
the job file with 'ant job'

On 24 August 2011 16:09, Ferdy Galema <fe...@kalooga.com> wrote:

> Hi,
>
> Compiling Nutch 1.3 with patch NUTCH-993 (newest patch) and configuring
> "mapreduce.job.jar.unpack.**pattern" and "plugin.folders" according to
> issue NUTCH-937 still won't allow me to run the stand-alone job jar. What
> else should I patch/configure in order to do so? The command I use is
> "hadoop jar nutch-1.3.job org.apache.nutch.crawl.Crawl /urls -dir /root".
> This results in the stacktrace below. (I guess a plugin folder is referenced
> that does not exist.)
>
> 11/08/24 17:05:37 INFO mapred.JobClient: Task Id :
> attempt_201108231047_0009_m_**000000_0, Status : FAILED
> java.lang.RuntimeException: Error in configuring object
>    at org.apache.hadoop.util.**ReflectionUtils.setJobConf(**
> ReflectionUtils.java:93)
>    at org.apache.hadoop.util.**ReflectionUtils.setConf(**
> ReflectionUtils.java:64)
>    at org.apache.hadoop.util.**ReflectionUtils.newInstance(**
> ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**java:386)
>    at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:324)
>    at org.apache.hadoop.mapred.**Child$4.run(Child.java:268)
>    at java.security.**AccessController.doPrivileged(**Native Method)
>    at javax.security.auth.Subject.**doAs(Subject.java:396)
>    at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1115)
>    at org.apache.hadoop.mapred.**Child.main(Child.java:262)
> Caused by: java.lang.reflect.**InvocationTargetException
>    at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
>    at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> NativeMethodAccessorImpl.java:**39)
>    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> DelegatingMethodAccessorImpl.**java:25)
>    at java.lang.reflect.Method.**invoke(Method.java:597)
>    at org.apache.hadoop.util.**ReflectionUtils.setJobConf(**
> ReflectionUtils.java:88)
>    ... 9 more
> Caused by: java.lang.RuntimeException: Error in configuring object
>    at org.apache.hadoop.util.**ReflectionUtils.setJobConf(**
> ReflectionUtils.java:93)
>    at org.apache.hadoop.util.**ReflectionUtils.setConf(**
> ReflectionUtils.java:64)
>    at org.apache.hadoop.util.**ReflectionUtils.newInstance(**
> ReflectionUtils.java:117)
>    at org.apache.hadoop.mapred.**MapRunner.configure(MapRunner.**java:34)
>    ... 14 more
> Caused by: java.lang.reflect.**InvocationTargetException
>    at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native Method)
>    at sun.reflect.**NativeMethodAccessorImpl.**invoke(**
> NativeMethodAccessorImpl.java:**39)
>    at sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
> DelegatingMethodAccessorImpl.**java:25)
>    at java.lang.reflect.Method.**invoke(Method.java:597)
>    at org.apache.hadoop.util.**ReflectionUtils.setJobConf(**
> ReflectionUtils.java:88)
>    ... 17 more
> Caused by: java.lang.NullPointerException
>    at org.apache.nutch.plugin.**PluginManifestParser.**parsePluginFolder(*
> *PluginManifestParser.java:87)
>    at org.apache.nutch.plugin.**PluginRepository.<init>(**
> PluginRepository.java:71)
>    at org.apache.nutch.plugin.**PluginRepository.get(**
> PluginRepository.java:99)
>    at org.apache.nutch.net.**URLNormalizers.<init>(**
> URLNormalizers.java:117)
>    at org.apache.nutch.crawl.**Injector$InjectMapper.**
> configure(Injector.java:70)
>    ... 22 more
>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com