You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Shivani Rao <sg...@purdue.edu> on 2011/02/25 23:37:33 UTC
a hadoop input format question
I am running basic hadoop examples on amazon emr and I am stuck at a very
simple place. I am apparently not passing the right "classname" for
inputFormat
>From hadoop documentation it seems like "TextInputFormat" is a valid option
for input format
I am running a simple sort example using mapreduce.
Here is the command variations I tried, all to vain:
$usr/local/hadoop/bin/hadoop jar /path to hadoop
examples/hadoop-0.18.0-examples.jar sort -inFormat TextInputFormat
-outFormat TextOutputFormat /path to datainput/datain/ /path to data
output/dataout
The sort function does not declare "TextInputFormat" in its import list.
Could that be a problem
?
Could it be a version problem?
Any help is aprpeciated!
Shivani
--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao
Re: a hadoop input format question
Posted by raoshivani <ra...@gmail.com>.
Hello Simon,
I tried with hadoop-0.20 examples and still the input format error for the
sort program. I took a second look at the sort.java code and looks like the
default class is SequeceFileInputFormat
Class<? extends InputFormat> inputFormatClass =
SequenceFileInputFormat.class;
So if I do not specify a class I am going to get an input format error
I am unable to specify the right inputformat class.
Any ideas?
Regards,
Shivani
--
View this message in context: http://lucene.472066.n3.nabble.com/a-hadoop-input-format-question-tp2588087p2627190.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: a hadoop input format question
Posted by raoshivani <ra...@gmail.com>.
Hello Simon,
I tried with hadoop-0.20 examples and still the input format error for the
sort program. I took a second look at the sort.java code and looks like the
default class is SequeceFileInputFormat
Class<? extends InputFormat> inputFormatClass =
SequenceFileInputFormat.class;
So if I do not specify a class I am going to get an input format error
I am unable to specify the right inputformat class.
Any ideas?
Regards,
Shivani
--
View this message in context: http://lucene.472066.n3.nabble.com/a-hadoop-input-format-question-tp2588087p2627274.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: a hadoop input format question
Posted by Simon <gs...@gmail.com>.
Firstly, I think your hadoop version is a bit too old, maybe you can try
version number larger than 20.
And try to run the sort sample with the following command.
bin/hadoop jar hadoop-*-examples.jar sort [-m <#maps>] [-r <#reduces>]
<in-dir> <out-dir>
HTH.
Simon
On Fri, Feb 25, 2011 at 5:37 PM, Shivani Rao <sg...@purdue.edu> wrote:
> I am running basic hadoop examples on amazon emr and I am stuck at a very
> simple place. I am apparently not passing the right "classname" for
> inputFormat
>
> From hadoop documentation it seems like "TextInputFormat" is a valid option
> for input format
>
> I am running a simple sort example using mapreduce.
>
> Here is the command variations I tried, all to vain:
>
>
> $usr/local/hadoop/bin/hadoop jar /path to hadoop
> examples/hadoop-0.18.0-examples.jar sort -inFormat TextInputFormat
> -outFormat TextOutputFormat /path to datainput/datain/ /path to data
> output/dataout
>
> The sort function does not declare "TextInputFormat" in its import list.
> Could that be a problem
> ?
> Could it be a version problem?
>
>
> Any help is aprpeciated!
> Shivani
>
>
>
> --
> Research Scholar,
> School of Electrical and Computer Engineering
> Purdue University
> West Lafayette IN
> web.ics.purdue.edu/~sgrao <http://web.ics.purdue.edu/%7Esgrao>
>
--
Regards,
Simon
Re: Hadoop 0.21 running problems , no namenode to stop
Posted by rahul patodi <pa...@gmail.com>.
Hi,
Please check logs, there might be some error occured while starting daemons
Please post the error
On Thu, Mar 3, 2011 at 10:24 AM, Shivani Rao <sg...@purdue.edu> wrote:
> Problems running local installation of hadoop on single-node cluster
>
> I followed instructions given by tutorials to run hadoop-0.21 on a single
> node cluster.
>
> The first problem I encountered was that of HADOOP-6953. Thankfully that
> has got fixed.
>
> The other problem I am facing is that the datanode does not start. This I
> guess because when I run stop-dfs.sh for datanode, I get a message
> "no datanode to stop"
>
> I am wondering if it is related remotely to the difference in the IP
> addresses on my computer
>
> 127.0.0.1 localhost
> 127.0.1.1 my-laptop
>
> Although I am aware of this, I do not know how to fix this.
>
> I am unable to even run a simple pi estimate example on the haddop
> installation
>
> This is the output I get is
>
> bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
> Number of Maps = 10
> Samples per Map = 10
> 11/03/02 23:38:47 INFO security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
> cacheTimeout=300000
>
> And nothing else for long long time.
>
> I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But
> After running bin/hadoop namenode -format, I see that the tmp.dir has a
> folder with dfs/data and dfs/data folders for the two directories.
>
> what Am I doing wrong? Any help is appreciated.
>
> Here are my configuration files
>
> Regards,
> Shivani
>
> hdfs-site.xml
>
> <property>
> <name>dfs.replication</name>
> <value>1</value>
> <description>Default block replication.
> The actual number of replications can be specified when the file is
> created.
> The default is used if replication is not specified in create time.
> </description>
> </property>
>
>
> core-site.xml
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>/usr/local/hadoop-${user.name}</value>
> <description>A base for other temporary directories.</description>
> </property>
>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://localhost:54310</value>
> <description>The name of the default file system. A URI whose
> scheme and authority determine the FileSystem implementation. The
> uri's scheme determines the config property (fs.SCHEME.impl) naming
> the FileSystem implementation class. The uri's authority is used to
> determine the host, port, etc. for a filesystem.</description>
> </property>
>
>
>
> mapred-site.xml
>
> <property>
> <name>mapred.job.tracker</name>
> <value>localhost:54311</value>
> <description>The host and port that the MapReduce job tracker runs
> at. If "local", then jobs are run in-process as a single map
> and reduce task.
> </description>
> </property>
>
>
>
>
Hadoop 0.21 running problems , no namenode to stop
Posted by Shivani Rao <sg...@purdue.edu>.
Problems running local installation of hadoop on single-node cluster
I followed instructions given by tutorials to run hadoop-0.21 on a single node cluster.
The first problem I encountered was that of HADOOP-6953. Thankfully that has got fixed.
The other problem I am facing is that the datanode does not start. This I guess because when I run stop-dfs.sh for datanode, I get a message
"no datanode to stop"
I am wondering if it is related remotely to the difference in the IP addresses on my computer
127.0.0.1 localhost
127.0.1.1 my-laptop
Although I am aware of this, I do not know how to fix this.
I am unable to even run a simple pi estimate example on the haddop installation
This is the output I get is
bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
11/03/02 23:38:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
And nothing else for long long time.
I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But After running bin/hadoop namenode -format, I see that the tmp.dir has a folder with dfs/data and dfs/data folders for the two directories.
what Am I doing wrong? Any help is appreciated.
Here are my configuration files
Regards,
Shivani
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
Hadoop 0.21 running problems , no namenode to stop
Posted by Shivani Rao <sg...@purdue.edu>.
Problems running local installation of hadoop on single-node cluster
I followed instructions given by tutorials to run hadoop-0.21 on a single node cluster.
The first problem I encountered was that of HADOOP-6953. Thankfully that has got fixed.
The other problem I am facing is that the datanode does not start. This I guess because when I run stop-dfs.sh for datanode, I get a message
"no datanode to stop"
I am wondering if it is related remotely to the difference in the IP addresses on my computer
127.0.0.1 localhost
127.0.1.1 my-laptop
Although I am aware of this, I do not know how to fix this.
I am unable to even run a simple pi estimate example on the haddop installation
This is the output I get is
bin/hadoop jar hadoop-mapred-examples-0.21.0.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
11/03/02 23:38:47 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
And nothing else for long long time.
I have not set the dfs.namedir and dfs.datadir in my hdfs-site.xml. But After running bin/hadoop namenode -format, I see that the tmp.dir has a folder with dfs/data and dfs/data folders for the two directories.
what Am I doing wrong? Any help is appreciated.
Here are my configuration files
Regards,
Shivani
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-${user.name}</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>