You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Manikandan Saravanan <ma...@thesocialpeople.net> on 2014/01/04 18:25:53 UTC
Hadoop doesn't find the input file
Hi,
I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
I’m getting something like:
INFO input.FileInputFormat: Total input paths to process : 0
Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Manikandan Saravanan <ma...@thesocialpeople.net>.
Hmm.. I just removed the “crawl” directory (output directory) from the command and it works! I’m storing the output in a Cassandra cluster using Gora anyway. So I don’t think I want to store that on HDFS :)
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
On 4 January 2014 at 11:06:56 pm, Ted Yu (yuzhihong@gmail.com) wrote:
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
Hi,
I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
I’m getting something like:
INFO input.FileInputFormat: Total input paths to process : 0
Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Manikandan Saravanan <ma...@thesocialpeople.net>.
Hmm.. I just removed the “crawl” directory (output directory) from the command and it works! I’m storing the output in a Cassandra cluster using Gora anyway. So I don’t think I want to store that on HDFS :)
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
On 4 January 2014 at 11:06:56 pm, Ted Yu (yuzhihong@gmail.com) wrote:
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
Hi,
I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
I’m getting something like:
INFO input.FileInputFormat: Total input paths to process : 0
Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Manikandan Saravanan <ma...@thesocialpeople.net>.
Hmm.. I just removed the “crawl” directory (output directory) from the command and it works! I’m storing the output in a Cassandra cluster using Gora anyway. So I don’t think I want to store that on HDFS :)
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
On 4 January 2014 at 11:06:56 pm, Ted Yu (yuzhihong@gmail.com) wrote:
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
Hi,
I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
I’m getting something like:
INFO input.FileInputFormat: Total input paths to process : 0
Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Manikandan Saravanan <ma...@thesocialpeople.net>.
Hmm.. I just removed the “crawl” directory (output directory) from the command and it works! I’m storing the output in a Cassandra cluster using Gora anyway. So I don’t think I want to store that on HDFS :)
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
On 4 January 2014 at 11:06:56 pm, Ted Yu (yuzhihong@gmail.com) wrote:
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
Hi,
I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
$HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
I’m getting something like:
INFO input.FileInputFormat: Total input paths to process : 0
Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
--
Manikandan Saravanan
Architect - Technology
TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
> Hi,
>
> I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
>
> $HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
>
> I’m getting something like:
>
> INFO input.FileInputFormat: Total input paths to process : 0
>
> Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
>
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
> Hi,
>
> I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
>
> $HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
>
> I’m getting something like:
>
> INFO input.FileInputFormat: Total input paths to process : 0
>
> Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
>
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
> Hi,
>
> I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
>
> $HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
>
> I’m getting something like:
>
> INFO input.FileInputFormat: Total input paths to process : 0
>
> Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
>
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople
Re: Hadoop doesn't find the input file
Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin the stack trace involving the NPE ?
Thanks
On Jan 4, 2014, at 9:25 AM, Manikandan Saravanan <ma...@thesocialpeople.net> wrote:
> Hi,
>
> I’m trying to run Nutch 2.2.1 on a Haddop 2-node cluster. My hadoop cluster is running fine and I’ve successfully added the input and output directory on to HDFS. But when I run
>
> $HADOOP_HOME/bin/hadoop jar /nutch/apache-nutch-2.2.1.job org.apache.nutch.crawl.Crawler urls -dir crawl -depth 3 -topN 5
>
> I’m getting something like:
>
> INFO input.FileInputFormat: Total input paths to process : 0
>
> Which, I understand, is meaning that Hadoop cannot locate the input files. The job ends for obvious reasons citing the null pointer exception. Can someone help me out?
>
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople