You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Imtiaz Shakil Siddique <sh...@gmail.com> on 2015/09/11 16:14:41 UTC

Compatible Hadoop version with Nutch 1.10

Hi,

I was trying to test nutch 1.10 with Hadoop-2.7.1 but during the inject
phase I came across with some errors.

> I was executing $NUTCH_HOME/runtime/deploy/bin/crawl -i  /home/nutch/urls
> /home/nutch/crawl/ 1

15/09/10 19:41:17 ERROR crawl.Injector: Injector:
> java.lang.IllegalArgumentException: Wrong FS:
> hdfs://localhost:9000/user/root/inject-temp-875522145, expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:601)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
> at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1437)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:506)
> at org.apache.nutch.crawl.CrawlDb.install(CrawlDb.java:168)
> at org.apache.nutch.crawl.Injector.inject(Injector.java:356)
> at org.apache.nutch.crawl.Injector.run(Injector.java:379)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.crawl.Injector.main(Injector.java:369)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


my conf file (hadoop-2.7.1) is given below
-------- core-site.xml --------

<property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/nutch/hadoopData/hadoopTmpDir</value>
</property>

-------- hdfs-site.xml --------
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>/home/nutch/hadoopData/nameNodeData</value>
 </property>

 <property>
   <name>dfs.datanode.data.dir</name>
   <value>/home/nutch/hadoopData/dataNodeData</value>
 </property>

     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
-------- mapred-site.xml --------
<property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
     </property>
<property>

<name>mapred.system.dir</name>

   <value>/home/nutch/hadoopData/mapredJobTrackerData</value>
</property>
<property>

 <name>mapred.local.dir</name>

  <value>/home/nutch/hadoopData/mapredTaskTrackerData</value>

</property>

But the same command works successfully when I use Hadoop-1.2.1.
What is the preferred version of Hadoop that we should use with Apache
Nutch 1.10


Thank you so much.
Imtiaz Shakil Siddique

Re: Compatible Hadoop version with Nutch 1.10

Posted by Imtiaz Shakil Siddique <sh...@gmail.com>.
Hi,

Nutch 1.10 is already released. Are you mentioning about any newer nutch
release(nutch 1.11).

Thanks for the help Sir.
Imtiaz Shakil Siddique
On Sep 14, 2015 8:57 PM, "Sebastian Nagel" <wa...@googlemail.com>
wrote:

> Hi,
>
> Nutch 1.10 is supposed to run with Hadoop 1.2.0.
> 1.10 (to be released soon) will run with 2.4.0,
> and probably also with newer Hadoop versions.
>
> If you need Nutch with a recent Hadoop version
> right now, you could build it by yourself from trunk.
>
> Cheers,
> Sebastian
>
> 2015-09-11 16:14 GMT+02:00 Imtiaz Shakil Siddique <shakilsust006@gmail.com
> >:
>
> > Hi,
> >
> > I was trying to test nutch 1.10 with Hadoop-2.7.1 but during the inject
> > phase I came across with some errors.
> >
> > > I was executing $NUTCH_HOME/runtime/deploy/bin/crawl -i
> /home/nutch/urls
> > > /home/nutch/crawl/ 1
> >
> > 15/09/10 19:41:17 ERROR crawl.Injector: Injector:
> > > java.lang.IllegalArgumentException: Wrong FS:
> > > hdfs://localhost:9000/user/root/inject-temp-875522145, expected:
> file:///
> > > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)
> > > at
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
> > > at
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:601)
> > > at
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
> > > at
> > >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
> > > at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1437)
> > > at
> > >
> >
> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:506)
> > > at org.apache.nutch.crawl.CrawlDb.install(CrawlDb.java:168)
> > > at org.apache.nutch.crawl.Injector.inject(Injector.java:356)
> > > at org.apache.nutch.crawl.Injector.run(Injector.java:379)
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > > at org.apache.nutch.crawl.Injector.main(Injector.java:369)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:497)
> > > at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> > > at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> >
> >
> > my conf file (hadoop-2.7.1) is given below
> > -------- core-site.xml --------
> >
> > <property>
> >          <name>fs.default.name</name>
> >          <value>hdfs://localhost:9000</value>
> > </property>
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>/home/nutch/hadoopData/hadoopTmpDir</value>
> > </property>
> >
> > -------- hdfs-site.xml --------
> >  <property>
> >    <name>dfs.namenode.name.dir</name>
> >    <value>/home/nutch/hadoopData/nameNodeData</value>
> >  </property>
> >
> >  <property>
> >    <name>dfs.datanode.data.dir</name>
> >    <value>/home/nutch/hadoopData/dataNodeData</value>
> >  </property>
> >
> >      <property>
> >          <name>dfs.replication</name>
> >          <value>1</value>
> >      </property>
> > -------- mapred-site.xml --------
> > <property>
> >         <name>mapred.job.tracker</name>
> >         <value>localhost:9001</value>
> >      </property>
> > <property>
> >
> > <name>mapred.system.dir</name>
> >
> >    <value>/home/nutch/hadoopData/mapredJobTrackerData</value>
> > </property>
> > <property>
> >
> >  <name>mapred.local.dir</name>
> >
> >   <value>/home/nutch/hadoopData/mapredTaskTrackerData</value>
> >
> > </property>
> >
> > But the same command works successfully when I use Hadoop-1.2.1.
> > What is the preferred version of Hadoop that we should use with Apache
> > Nutch 1.10
> >
> >
> > Thank you so much.
> > Imtiaz Shakil Siddique
> >
>

Re: Compatible Hadoop version with Nutch 1.10

Posted by Sebastian Nagel <wa...@googlemail.com>.
Hi,

Nutch 1.10 is supposed to run with Hadoop 1.2.0.
1.10 (to be released soon) will run with 2.4.0,
and probably also with newer Hadoop versions.

If you need Nutch with a recent Hadoop version
right now, you could build it by yourself from trunk.

Cheers,
Sebastian

2015-09-11 16:14 GMT+02:00 Imtiaz Shakil Siddique <sh...@gmail.com>:

> Hi,
>
> I was trying to test nutch 1.10 with Hadoop-2.7.1 but during the inject
> phase I came across with some errors.
>
> > I was executing $NUTCH_HOME/runtime/deploy/bin/crawl -i  /home/nutch/urls
> > /home/nutch/crawl/ 1
>
> 15/09/10 19:41:17 ERROR crawl.Injector: Injector:
> > java.lang.IllegalArgumentException: Wrong FS:
> > hdfs://localhost:9000/user/root/inject-temp-875522145, expected: file:///
> > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:646)
> > at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
> > at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:601)
> > at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
> > at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
> > at org.apache.hadoop.fs.FileSystem.isDirectory(FileSystem.java:1437)
> > at
> >
> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:506)
> > at org.apache.nutch.crawl.CrawlDb.install(CrawlDb.java:168)
> > at org.apache.nutch.crawl.Injector.inject(Injector.java:356)
> > at org.apache.nutch.crawl.Injector.run(Injector.java:379)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> > at org.apache.nutch.crawl.Injector.main(Injector.java:369)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:497)
> > at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>
>
> my conf file (hadoop-2.7.1) is given below
> -------- core-site.xml --------
>
> <property>
>          <name>fs.default.name</name>
>          <value>hdfs://localhost:9000</value>
> </property>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/home/nutch/hadoopData/hadoopTmpDir</value>
> </property>
>
> -------- hdfs-site.xml --------
>  <property>
>    <name>dfs.namenode.name.dir</name>
>    <value>/home/nutch/hadoopData/nameNodeData</value>
>  </property>
>
>  <property>
>    <name>dfs.datanode.data.dir</name>
>    <value>/home/nutch/hadoopData/dataNodeData</value>
>  </property>
>
>      <property>
>          <name>dfs.replication</name>
>          <value>1</value>
>      </property>
> -------- mapred-site.xml --------
> <property>
>         <name>mapred.job.tracker</name>
>         <value>localhost:9001</value>
>      </property>
> <property>
>
> <name>mapred.system.dir</name>
>
>    <value>/home/nutch/hadoopData/mapredJobTrackerData</value>
> </property>
> <property>
>
>  <name>mapred.local.dir</name>
>
>   <value>/home/nutch/hadoopData/mapredTaskTrackerData</value>
>
> </property>
>
> But the same command works successfully when I use Hadoop-1.2.1.
> What is the preferred version of Hadoop that we should use with Apache
> Nutch 1.10
>
>
> Thank you so much.
> Imtiaz Shakil Siddique
>