You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by jeremy p <at...@gmail.com> on 2012/10/05 19:21:02 UTC

When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
problem, because /tmp gets wiped out by Linux when you reboot, leading to
this lovely error from the JobTracker :

2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
...
2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
The only way I've found to fix this is to reformat your name node, which
rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again
when you reboot.

So I went ahead and created a folder called /hadoop_temp and gave all users
read/write access to it. I then set this property in my core-site.xml :

<property>
<name>hadoop.tmp.dir</name>
<value>file:///hadoop_temp</value>
</property>

When I re-formatted my namenode, Hadoop seemed happy, giving me this
message :

12/10/05 07:58:54 INFO common.Storage: Storage directory
file:/hadoop_temp/dfs/name has been successfully formatted.
However, when I looked at /hadoop_temp, I noticed that the folder was
empty. And then when I restarted Hadoop and checked my JobTracker log, I
saw this :

2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
...
2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
connection exception: java.net.ConnectException: Connection refused
And when I checked my namenode log, I saw this :

2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does
not exist.
2012-10-05 08:00:31,212 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
Directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an
inconsistent state: storage directory does not exist or is not accessible.
So, clearly I didn't configure something right. Hadoop still expects to see
its files in the /tmp folder even though I set hadoop.tmp.dir to
/hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
"right" value for hadoop.tmp.dir?

Bonus question : what should I use for hbase.tmp.dir?

System info :

Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1

Thanks for taking a look!

--Jeremy

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by jeremy p <at...@gmail.com>.

Thank you, that worked!

On Fri, Oct 5, 2012 at 10:58 AM, Harsh J <ha...@cloudera.com> wrote:

> On 0.20.x or 1.x based releases, do not use a file:/// prefix for
> hadoop.tmp.dir. That won't work. Remove it and things should work, I
> guess.
>
> And yes, for production, either tweak specific configs (like
> dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
> mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
> than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
> another temporary local store (non DFS).
>
> On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
> <at...@gmail.com> wrote:
> > By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> > problem, because /tmp gets wiped out by Linux when you reboot, leading to
> > this lovely error from the JobTracker :
> >
> > 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> >     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> > The only way I've found to fix this is to reformat your name node, which
> > rebuilds the /tmp/hadoop-root folder, which of course gets wiped out
> again
> > when you reboot.
> >
> > So I went ahead and created a folder called /hadoop_temp and gave all
> users
> > read/write access to it. I then set this property in my core-site.xml :
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>file:///hadoop_temp</value>
> > </property>
> >
> > When I re-formatted my namenode, Hadoop seemed happy, giving me this
> message
> > :
> >
> > 12/10/05 07:58:54 INFO common.Storage: Storage directory
> > file:/hadoop_temp/dfs/name has been successfully formatted.
> > However, when I looked at /hadoop_temp, I noticed that the folder was
> empty.
> > And then when I restarted Hadoop and checked my JobTracker log, I saw
> this :
> >
> > 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > And when I checked my namenode log, I saw this :
> >
> > 2012-10-05 08:00:31,206 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name
> does
> > not exist.
> > 2012-10-05 08:00:31,212 ERROR
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> > initialization failed.
> > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
> Directory
> > /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an
> inconsistent
> > state: storage directory does not exist or is not accessible.
> > So, clearly I didn't configure something right. Hadoop still expects to
> see
> > its files in the /tmp folder even though I set hadoop.tmp.dir to
> > /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> > "right" value for hadoop.tmp.dir?
> >
> > Bonus question : what should I use for hbase.tmp.dir?
> >
> > System info :
> >
> > Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
> >
> > Thanks for taking a look!
> >
> > --Jeremy
>
>
>
> --
> Harsh J
>

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by jeremy p <at...@gmail.com>.

Thank you, that worked!

On Fri, Oct 5, 2012 at 10:58 AM, Harsh J <ha...@cloudera.com> wrote:

> On 0.20.x or 1.x based releases, do not use a file:/// prefix for
> hadoop.tmp.dir. That won't work. Remove it and things should work, I
> guess.
>
> And yes, for production, either tweak specific configs (like
> dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
> mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
> than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
> another temporary local store (non DFS).
>
> On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
> <at...@gmail.com> wrote:
> > By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> > problem, because /tmp gets wiped out by Linux when you reboot, leading to
> > this lovely error from the JobTracker :
> >
> > 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> >     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> > The only way I've found to fix this is to reformat your name node, which
> > rebuilds the /tmp/hadoop-root folder, which of course gets wiped out
> again
> > when you reboot.
> >
> > So I went ahead and created a folder called /hadoop_temp and gave all
> users
> > read/write access to it. I then set this property in my core-site.xml :
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>file:///hadoop_temp</value>
> > </property>
> >
> > When I re-formatted my namenode, Hadoop seemed happy, giving me this
> message
> > :
> >
> > 12/10/05 07:58:54 INFO common.Storage: Storage directory
> > file:/hadoop_temp/dfs/name has been successfully formatted.
> > However, when I looked at /hadoop_temp, I noticed that the folder was
> empty.
> > And then when I restarted Hadoop and checked my JobTracker log, I saw
> this :
> >
> > 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > And when I checked my namenode log, I saw this :
> >
> > 2012-10-05 08:00:31,206 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name
> does
> > not exist.
> > 2012-10-05 08:00:31,212 ERROR
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> > initialization failed.
> > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
> Directory
> > /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an
> inconsistent
> > state: storage directory does not exist or is not accessible.
> > So, clearly I didn't configure something right. Hadoop still expects to
> see
> > its files in the /tmp folder even though I set hadoop.tmp.dir to
> > /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> > "right" value for hadoop.tmp.dir?
> >
> > Bonus question : what should I use for hbase.tmp.dir?
> >
> > System info :
> >
> > Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
> >
> > Thanks for taking a look!
> >
> > --Jeremy
>
>
>
> --
> Harsh J
>

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by jeremy p <at...@gmail.com>.

Thank you, that worked!

On Fri, Oct 5, 2012 at 10:58 AM, Harsh J <ha...@cloudera.com> wrote:

> On 0.20.x or 1.x based releases, do not use a file:/// prefix for
> hadoop.tmp.dir. That won't work. Remove it and things should work, I
> guess.
>
> And yes, for production, either tweak specific configs (like
> dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
> mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
> than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
> another temporary local store (non DFS).
>
> On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
> <at...@gmail.com> wrote:
> > By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> > problem, because /tmp gets wiped out by Linux when you reboot, leading to
> > this lovely error from the JobTracker :
> >
> > 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> >     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> > The only way I've found to fix this is to reformat your name node, which
> > rebuilds the /tmp/hadoop-root folder, which of course gets wiped out
> again
> > when you reboot.
> >
> > So I went ahead and created a folder called /hadoop_temp and gave all
> users
> > read/write access to it. I then set this property in my core-site.xml :
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>file:///hadoop_temp</value>
> > </property>
> >
> > When I re-formatted my namenode, Hadoop seemed happy, giving me this
> message
> > :
> >
> > 12/10/05 07:58:54 INFO common.Storage: Storage directory
> > file:/hadoop_temp/dfs/name has been successfully formatted.
> > However, when I looked at /hadoop_temp, I noticed that the folder was
> empty.
> > And then when I restarted Hadoop and checked my JobTracker log, I saw
> this :
> >
> > 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > And when I checked my namenode log, I saw this :
> >
> > 2012-10-05 08:00:31,206 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name
> does
> > not exist.
> > 2012-10-05 08:00:31,212 ERROR
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> > initialization failed.
> > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
> Directory
> > /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an
> inconsistent
> > state: storage directory does not exist or is not accessible.
> > So, clearly I didn't configure something right. Hadoop still expects to
> see
> > its files in the /tmp folder even though I set hadoop.tmp.dir to
> > /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> > "right" value for hadoop.tmp.dir?
> >
> > Bonus question : what should I use for hbase.tmp.dir?
> >
> > System info :
> >
> > Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
> >
> > Thanks for taking a look!
> >
> > --Jeremy
>
>
>
> --
> Harsh J
>

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by jeremy p <at...@gmail.com>.

Thank you, that worked!

On Fri, Oct 5, 2012 at 10:58 AM, Harsh J <ha...@cloudera.com> wrote:

> On 0.20.x or 1.x based releases, do not use a file:/// prefix for
> hadoop.tmp.dir. That won't work. Remove it and things should work, I
> guess.
>
> And yes, for production, either tweak specific configs (like
> dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
> mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
> than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
> another temporary local store (non DFS).
>
> On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
> <at...@gmail.com> wrote:
> > By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> > problem, because /tmp gets wiped out by Linux when you reboot, leading to
> > this lovely error from the JobTracker :
> >
> > 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> >     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> > The only way I've found to fix this is to reformat your name node, which
> > rebuilds the /tmp/hadoop-root folder, which of course gets wiped out
> again
> > when you reboot.
> >
> > So I went ahead and created a folder called /hadoop_temp and gave all
> users
> > read/write access to it. I then set this property in my core-site.xml :
> >
> > <property>
> > <name>hadoop.tmp.dir</name>
> > <value>file:///hadoop_temp</value>
> > </property>
> >
> > When I re-formatted my namenode, Hadoop seemed happy, giving me this
> message
> > :
> >
> > 12/10/05 07:58:54 INFO common.Storage: Storage directory
> > file:/hadoop_temp/dfs/name has been successfully formatted.
> > However, when I looked at /hadoop_temp, I noticed that the folder was
> empty.
> > And then when I restarted Hadoop and checked my JobTracker log, I saw
> this :
> >
> > 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> > ...
> > 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying
> connect
> > to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> > 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> > cleaning system directory: null
> > java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> > connection exception: java.net.ConnectException: Connection refused
> > And when I checked my namenode log, I saw this :
> >
> > 2012-10-05 08:00:31,206 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name
> does
> > not exist.
> > 2012-10-05 08:00:31,212 ERROR
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> > initialization failed.
> > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException:
> Directory
> > /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an
> inconsistent
> > state: storage directory does not exist or is not accessible.
> > So, clearly I didn't configure something right. Hadoop still expects to
> see
> > its files in the /tmp folder even though I set hadoop.tmp.dir to
> > /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> > "right" value for hadoop.tmp.dir?
> >
> > Bonus question : what should I use for hbase.tmp.dir?
> >
> > System info :
> >
> > Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
> >
> > Thanks for taking a look!
> >
> > --Jeremy
>
>
>
> --
> Harsh J
>

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by Harsh J <ha...@cloudera.com>.

On 0.20.x or 1.x based releases, do not use a file:/// prefix for
hadoop.tmp.dir. That won't work. Remove it and things should work, I
guess.

And yes, for production, either tweak specific configs (like
dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
another temporary local store (non DFS).

On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
<at...@gmail.com> wrote:
> By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> problem, because /tmp gets wiped out by Linux when you reboot, leading to
> this lovely error from the JobTracker :
>
> 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> The only way I've found to fix this is to reformat your name node, which
> rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again
> when you reboot.
>
> So I went ahead and created a folder called /hadoop_temp and gave all users
> read/write access to it. I then set this property in my core-site.xml :
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>file:///hadoop_temp</value>
> </property>
>
> When I re-formatted my namenode, Hadoop seemed happy, giving me this message
> :
>
> 12/10/05 07:58:54 INFO common.Storage: Storage directory
> file:/hadoop_temp/dfs/name has been successfully formatted.
> However, when I looked at /hadoop_temp, I noticed that the folder was empty.
> And then when I restarted Hadoop and checked my JobTracker log, I saw this :
>
> 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
> And when I checked my namenode log, I saw this :
>
> 2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does
> not exist.
> 2012-10-05 08:00:31,212 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
> /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent
> state: storage directory does not exist or is not accessible.
> So, clearly I didn't configure something right. Hadoop still expects to see
> its files in the /tmp folder even though I set hadoop.tmp.dir to
> /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> "right" value for hadoop.tmp.dir?
>
> Bonus question : what should I use for hbase.tmp.dir?
>
> System info :
>
> Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
>
> Thanks for taking a look!
>
> --Jeremy



-- 
Harsh J

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by Harsh J <ha...@cloudera.com>.

On 0.20.x or 1.x based releases, do not use a file:/// prefix for
hadoop.tmp.dir. That won't work. Remove it and things should work, I
guess.

And yes, for production, either tweak specific configs (like
dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
another temporary local store (non DFS).

On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
<at...@gmail.com> wrote:
> By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> problem, because /tmp gets wiped out by Linux when you reboot, leading to
> this lovely error from the JobTracker :
>
> 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> The only way I've found to fix this is to reformat your name node, which
> rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again
> when you reboot.
>
> So I went ahead and created a folder called /hadoop_temp and gave all users
> read/write access to it. I then set this property in my core-site.xml :
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>file:///hadoop_temp</value>
> </property>
>
> When I re-formatted my namenode, Hadoop seemed happy, giving me this message
> :
>
> 12/10/05 07:58:54 INFO common.Storage: Storage directory
> file:/hadoop_temp/dfs/name has been successfully formatted.
> However, when I looked at /hadoop_temp, I noticed that the folder was empty.
> And then when I restarted Hadoop and checked my JobTracker log, I saw this :
>
> 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
> And when I checked my namenode log, I saw this :
>
> 2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does
> not exist.
> 2012-10-05 08:00:31,212 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
> /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent
> state: storage directory does not exist or is not accessible.
> So, clearly I didn't configure something right. Hadoop still expects to see
> its files in the /tmp folder even though I set hadoop.tmp.dir to
> /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> "right" value for hadoop.tmp.dir?
>
> Bonus question : what should I use for hbase.tmp.dir?
>
> System info :
>
> Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
>
> Thanks for taking a look!
>
> --Jeremy



-- 
Harsh J

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by Harsh J <ha...@cloudera.com>.

On 0.20.x or 1.x based releases, do not use a file:/// prefix for
hadoop.tmp.dir. That won't work. Remove it and things should work, I
guess.

And yes, for production, either tweak specific configs (like
dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
another temporary local store (non DFS).

On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
<at...@gmail.com> wrote:
> By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> problem, because /tmp gets wiped out by Linux when you reboot, leading to
> this lovely error from the JobTracker :
>
> 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> The only way I've found to fix this is to reformat your name node, which
> rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again
> when you reboot.
>
> So I went ahead and created a folder called /hadoop_temp and gave all users
> read/write access to it. I then set this property in my core-site.xml :
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>file:///hadoop_temp</value>
> </property>
>
> When I re-formatted my namenode, Hadoop seemed happy, giving me this message
> :
>
> 12/10/05 07:58:54 INFO common.Storage: Storage directory
> file:/hadoop_temp/dfs/name has been successfully formatted.
> However, when I looked at /hadoop_temp, I noticed that the folder was empty.
> And then when I restarted Hadoop and checked my JobTracker log, I saw this :
>
> 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
> And when I checked my namenode log, I saw this :
>
> 2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does
> not exist.
> 2012-10-05 08:00:31,212 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
> /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent
> state: storage directory does not exist or is not accessible.
> So, clearly I didn't configure something right. Hadoop still expects to see
> its files in the /tmp folder even though I set hadoop.tmp.dir to
> /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> "right" value for hadoop.tmp.dir?
>
> Bonus question : what should I use for hbase.tmp.dir?
>
> System info :
>
> Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
>
> Thanks for taking a look!
>
> --Jeremy



-- 
Harsh J

Re: When running Hadoop in pseudo-distributed mode, what directory should I use for hadoop.tmp.dir?

Posted by Harsh J <ha...@cloudera.com>.

On 0.20.x or 1.x based releases, do not use a file:/// prefix for
hadoop.tmp.dir. That won't work. Remove it and things should work, I
guess.

And yes, for production, either tweak specific configs (like
dfs.name.dir, dfs.data.dir, mapred.local.dir, mapred.system.dir (DFS),
mapreduce.jobtracker.staging.root.dir (DFS)) to specific paths rather
than hadoop.tmp.dir relative path and keep hadoop.tmp.dir at /tmp or
another temporary local store (non DFS).

On Fri, Oct 5, 2012 at 10:51 PM, jeremy p
<at...@gmail.com> wrote:
> By default, Hadoop sets hadoop.tmp.dir to your /tmp folder. This is a
> problem, because /tmp gets wiped out by Linux when you reboot, leading to
> this lovely error from the JobTracker :
>
> 2012-10-05 07:41:13,618 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 07:41:22,636 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 07:41:22,643 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
> The only way I've found to fix this is to reformat your name node, which
> rebuilds the /tmp/hadoop-root folder, which of course gets wiped out again
> when you reboot.
>
> So I went ahead and created a folder called /hadoop_temp and gave all users
> read/write access to it. I then set this property in my core-site.xml :
>
> <property>
> <name>hadoop.tmp.dir</name>
> <value>file:///hadoop_temp</value>
> </property>
>
> When I re-formatted my namenode, Hadoop seemed happy, giving me this message
> :
>
> 12/10/05 07:58:54 INFO common.Storage: Storage directory
> file:/hadoop_temp/dfs/name has been successfully formatted.
> However, when I looked at /hadoop_temp, I noticed that the folder was empty.
> And then when I restarted Hadoop and checked my JobTracker log, I saw this :
>
> 2012-10-05 08:02:41,988 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
> ...
> 2012-10-05 08:02:51,010 INFO org.apache.hadoop.ipc.Client: Retrying connect
> to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
> 2012-10-05 08:02:51,011 INFO org.apache.hadoop.mapred.JobTracker: problem
> cleaning system directory: null
> java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on
> connection exception: java.net.ConnectException: Connection refused
> And when I checked my namenode log, I saw this :
>
> 2012-10-05 08:00:31,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Storage directory /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name does
> not exist.
> 2012-10-05 08:00:31,212 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
> /opt/hadoop/hadoop-0.20.2/file:/hadoop_temp/dfs/name is in an inconsistent
> state: storage directory does not exist or is not accessible.
> So, clearly I didn't configure something right. Hadoop still expects to see
> its files in the /tmp folder even though I set hadoop.tmp.dir to
> /hadoop_temp in core-site.xml. What did I do wrong? What's the accepted
> "right" value for hadoop.tmp.dir?
>
> Bonus question : what should I use for hbase.tmp.dir?
>
> System info :
>
> Ubuntu 12.04, Apache Hadoop .20.2, Apache HBase .92.1
>
> Thanks for taking a look!
>
> --Jeremy



-- 
Harsh J