You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Shivani Rao <sg...@purdue.edu> on 2011/02/25 23:37:33 UTC

a hadoop input format question

I am running basic hadoop examples on amazon emr and I am stuck at a very
simple place. I am apparently not passing the right "classname" for
inputFormat

>From hadoop documentation it seems like "TextInputFormat" is a valid option
for input format

I am running a simple sort example using mapreduce.

Here is the command variations I tried, all to vain:


$usr/local/hadoop/bin/hadoop jar /path to hadoop
examples/hadoop-0.18.0-examples.jar sort -inFormat TextInputFormat
-outFormat TextOutputFormat /path to datainput/datain/ /path to data
output/dataout

The sort function does not declare "TextInputFormat" in its import list.
Could that be a problem
?
Could it be a version problem?


Any help is aprpeciated!
Shivani



-- 
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao

Re: a hadoop input format question

Posted by raoshivani <ra...@gmail.com>.
Hello Simon,

I tried with hadoop-0.20 examples and still the input format error for the
sort program. I took a second look at the sort.java code and looks like the
default class is SequeceFileInputFormat

    Class<? extends InputFormat> inputFormatClass = 
      SequenceFileInputFormat.class;

So if I do not specify a class I am going to get an input format error

I am unable to specify the right inputformat class.

Any ideas?

Regards,
Shivani

--
View this message in context: http://lucene.472066.n3.nabble.com/a-hadoop-input-format-question-tp2588087p2627190.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: a hadoop input format question

Posted by raoshivani <ra...@gmail.com>.
Hello Simon,

I tried with hadoop-0.20 examples and still the input format error for the
sort program. I took a second look at the sort.java code and looks like the
default class is SequeceFileInputFormat

    Class<? extends InputFormat> inputFormatClass = 
      SequenceFileInputFormat.class;

So if I do not specify a class I am going to get an input format error

I am unable to specify the right inputformat class.

Any ideas?

Regards,
Shivani

--
View this message in context: http://lucene.472066.n3.nabble.com/a-hadoop-input-format-question-tp2588087p2627274.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: a hadoop input format question

Posted by Simon <gs...@gmail.com>.
Firstly, I think your hadoop version is a bit too old, maybe you can try
version number larger than 20.
And try to run the sort sample with the following command.
bin/hadoop jar hadoop-*-examples.jar sort [-m <#maps>] [-r <#reduces>]
<in-dir> <out-dir>

HTH.
Simon
On Fri, Feb 25, 2011 at 5:37 PM, Shivani Rao <sg...@purdue.edu> wrote:

> I am running basic hadoop examples on amazon emr and I am stuck at a very
> simple place. I am apparently not passing the right "classname" for
> inputFormat
>
> From hadoop documentation it seems like "TextInputFormat" is a valid option
> for input format
>
> I am running a simple sort example using mapreduce.
>
> Here is the command variations I tried, all to vain:
>
>
> $usr/local/hadoop/bin/hadoop jar /path to hadoop
> examples/hadoop-0.18.0-examples.jar sort -inFormat TextInputFormat
> -outFormat TextOutputFormat /path to datainput/datain/ /path to data
> output/dataout
>
> The sort function does not declare "TextInputFormat" in its import list.
> Could that be a problem
> ?
> Could it be a version problem?
>
>
> Any help is aprpeciated!
> Shivani
>
>
>
> --
> Research Scholar,
> School of Electrical and Computer Engineering
> Purdue University
> West Lafayette IN
> web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>
>



-- 
Regards,
Simon

Re: Hadoop 0.21 running problems , no namenode to stop

Posted by rahul patodi <pa...@gmail.com>.
Hi,
Please check logs, there might be some error occured while starting daemons
Please post the error

On Thu, Mar 3, 2011 at 10:24 AM, Shivani Rao <sg...@purdue.edu> wrote:

> Problems running local installation of hadoop on single-node cluster
>
> I followed instructions given by tutorials to run hadoop-0.21 on a single
> node cluster.
>
> The first problem I encountered was that of HADOOP-6953. Thankfully that
> has got fixed.
>
> The other problem I am facing is that the datanode does not start. This I
> guess because when I run stop-dfs.sh  for datanode, I get a message
> "no datanode to stop"
>
> I am wondering if it is related remotely to the difference in the IP
> addresses on my computer
>
> 127.0.0.1       localhost
> 127.0.1.1       my-laptop
>
> Although I am aware of this, I do not know how to fix this.
>
> I am unable to even run a simple pi estimate example on the haddop
> installation
>
> This is the output I get is
>
> bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
> Number of Maps  = 10
> Samples per Map = 10
> 11/03/02 23:38:47 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
>
> And nothing else for long long time.
>
> I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But
> After running bin/hadoop namenode -format, I see that the tmp.dir has a
> folder with dfs/data and dfs/data folders for the two directories.
>
> what Am I doing wrong? Any help is appreciated.
>
> Here are my configuration files
>
> Regards,
> Shivani
>
> hdfs-site.xml
>
> <property>
>  <name>dfs.replication</name>
>  <value>1</value>
>  <description>Default block replication.
>  The actual number of replications can be specified when the file is
> created.
>  The default is used if replication is not specified in create time.
>  </description>
> </property>
>
>
> core-site.xml
>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/usr/local/hadoop-${user.name}</value>
>  <description>A base for other temporary directories.</description>
> </property>
>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://localhost:54310</value>
>  <description>The name of the default file system.  A URI whose
>  scheme and authority determine the FileSystem implementation.  The
>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>  the FileSystem implementation class.  The uri's authority is used to
>  determine the host, port, etc. for a filesystem.</description>
> </property>
>
>
>
> mapred-site.xml
>
> <property>
>  <name>mapred.job.tracker</name>
>  <value>localhost:54311</value>
>  <description>The host and port that the MapReduce job tracker runs
>  at.  If "local", then jobs are run in-process as a single map
>  and reduce task.
>  </description>
> </property>
>
>
>
>

Hadoop 0.21 running problems , no namenode to stop

Posted by Shivani Rao <sg...@purdue.edu>.
Problems running local installation of hadoop on single-node cluster

I followed instructions given by tutorials to run hadoop-0.21 on a single node cluster. 

The first problem I encountered was that of HADOOP-6953. Thankfully that has got fixed.

The other problem I am facing is that the datanode does not start. This I guess because when I run stop-dfs.sh  for datanode, I get a message 
"no datanode to stop"

I am wondering if it is related remotely to the difference in the IP addresses on my computer

127.0.0.1	localhost 
127.0.1.1	my-laptop 

Although I am aware of this, I do not know how to fix this.

I am unable to even run a simple pi estimate example on the haddop installation

This is the output I get is

bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
11/03/02 23:38:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

And nothing else for long long time.

I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But After running bin/hadoop namenode -format, I see that the tmp.dir has a folder with dfs/data and dfs/data folders for the two directories. 

what Am I doing wrong? Any help is appreciated.

Here are my configuration files

Regards,
Shivani

hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>


core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>



mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>




Hadoop 0.21 running problems , no namenode to stop

Posted by Shivani Rao <sg...@purdue.edu>.
Problems running local installation of hadoop on single-node cluster

I followed instructions given by tutorials to run hadoop-0.21 on a single node cluster. 

The first problem I encountered was that of HADOOP-6953. Thankfully that has got fixed.

The other problem I am facing is that the datanode does not start. This I guess because when I run stop-dfs.sh  for datanode, I get a message 
"no datanode to stop"

I am wondering if it is related remotely to the difference in the IP addresses on my computer

127.0.0.1	localhost 
127.0.1.1	my-laptop 

Although I am aware of this, I do not know how to fix this.

I am unable to even run a simple pi estimate example on the haddop installation

This is the output I get is

bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
11/03/02 23:38:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000

And nothing else for long long time.

I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But After running bin/hadoop namenode -format, I see that the tmp.dir has a folder with dfs/data and dfs/data folders for the two directories. 

what Am I doing wrong? Any help is appreciated.

Here are my configuration files

Regards,
Shivani

hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>


core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>



mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>