You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Azuryy Yu <az...@gmail.com> on 2013/06/25 11:39:00 UTC

A question for txid

Hi dear All,

It's long type for the txid currently,

FSImage.java:

boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
    throws IOException{

  editLog.setNextTxId(lastAppliedTxId + 1L);
}

Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?

Re: A question for txid

Posted by Azuryy Yu <az...@gmail.com>.
Thanks Harsh,Todd.

After 200 million years, spacemen manage the earth, they also know Hadoop,
but they cannot restart it, after a hard debug they find the txid has been
overflowed for many years.

--Send from my Sony mobile.
On Jun 25, 2013 10:52 PM, "Todd Lipcon" <to...@cloudera.com> wrote:

> I did some back of the envelope math when implementing txids, and
> determined that overflow is not ever going to happen... A "busy" namenode
> does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
> run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
> run your namenode for 2^(63-10-25) = 268 million years.
>
> Hadoop is great software and I'm sure it will be around for years to come,
> but if it's still running in 268 million years, that will be a pretty
> depressing rate of technological progress!
>
> -Todd
>
> On Tue, Jun 25, 2013 at 6:14 AM, Harsh J <ha...@cloudera.com> wrote:
>
> > Yes, it logically can if there have been as many transactions (its a
> > very very large number to reach though).
> >
> > Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.
> >
> > I hacked up my local NN's txids manually to go very large (close to
> > max) and decided to try out if this causes any harm. I basically
> > bumped up the freshly formatted starting txid to 9223372036854775805
> > (and ensured image references the same):
> >
> > ➜  current  ls
> > VERSION
> > fsimage_9223372036854775805.md5
> > fsimage_9223372036854775805
> > seen_txid
> > ➜  current  cat seen_txid
> > 9223372036854775805
> >
> > NameNode started up as expected.
> >
> > 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
> > in 0 seconds.
> > 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
> > 9223372036854775805 from
> > /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> > 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
> > 9223372036854775806
> >
> > I could create a bunch of files and do regular ops (counting to much
> > after the long max increments). I created over 100 files, just to make
> > it go well over the Long.MAX_VALUE.
> >
> > Quitting NameNode and restarting fails though, with the following error:
> >
> > 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> > java.io.IOException: Gap in transactions. Expected to be able to read
> > up until at least txid 9223372036854775806 but unable to find any edit
> > logs containing txid -9223372036854775808
> >
> > So it looks like it cannot currently handle an overflow.
> >
> > I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
> > this. I don't think this is of immediate concern though, so we should
> > be able to address it in future (unless there's parts of the code
> > which already are preventing reaching this number in the first place -
> > please do correct me if there is such a part).
> >
> > On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu <az...@gmail.com> wrote:
> > > Hi dear All,
> > >
> > > It's long type for the txid currently,
> > >
> > > FSImage.java:
> > >
> > > boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> > >     throws IOException{
> > >
> > >   editLog.setNextTxId(lastAppliedTxId + 1L);
> > > }
> > >
> > > Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?
> >
> >
> >
> > --
> > Harsh J
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: A question for txid

Posted by Todd Lipcon <to...@cloudera.com>.
I did some back of the envelope math when implementing txids, and
determined that overflow is not ever going to happen... A "busy" namenode
does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
run your namenode for 2^(63-10-25) = 268 million years.

Hadoop is great software and I'm sure it will be around for years to come,
but if it's still running in 268 million years, that will be a pretty
depressing rate of technological progress!

-Todd

On Tue, Jun 25, 2013 at 6:14 AM, Harsh J <ha...@cloudera.com> wrote:

> Yes, it logically can if there have been as many transactions (its a
> very very large number to reach though).
>
> Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.
>
> I hacked up my local NN's txids manually to go very large (close to
> max) and decided to try out if this causes any harm. I basically
> bumped up the freshly formatted starting txid to 9223372036854775805
> (and ensured image references the same):
>
> ➜  current  ls
> VERSION
> fsimage_9223372036854775805.md5
> fsimage_9223372036854775805
> seen_txid
> ➜  current  cat seen_txid
> 9223372036854775805
>
> NameNode started up as expected.
>
> 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
> in 0 seconds.
> 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
> 9223372036854775805 from
> /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
> 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
> 9223372036854775806
>
> I could create a bunch of files and do regular ops (counting to much
> after the long max increments). I created over 100 files, just to make
> it go well over the Long.MAX_VALUE.
>
> Quitting NameNode and restarting fails though, with the following error:
>
> 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
> java.io.IOException: Gap in transactions. Expected to be able to read
> up until at least txid 9223372036854775806 but unable to find any edit
> logs containing txid -9223372036854775808
>
> So it looks like it cannot currently handle an overflow.
>
> I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
> this. I don't think this is of immediate concern though, so we should
> be able to address it in future (unless there's parts of the code
> which already are preventing reaching this number in the first place -
> please do correct me if there is such a part).
>
> On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu <az...@gmail.com> wrote:
> > Hi dear All,
> >
> > It's long type for the txid currently,
> >
> > FSImage.java:
> >
> > boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
> >     throws IOException{
> >
> >   editLog.setNextTxId(lastAppliedTxId + 1L);
> > }
> >
> > Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?
>
>
>
> --
> Harsh J
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: A question for txid

Posted by Harsh J <ha...@cloudera.com>.
Yes, it logically can if there have been as many transactions (its a
very very large number to reach though).

Long.MAX_VALUE is (2^63 - 1) or 9223372036854775807.

I hacked up my local NN's txids manually to go very large (close to
max) and decided to try out if this causes any harm. I basically
bumped up the freshly formatted starting txid to 9223372036854775805
(and ensured image references the same):

➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805

NameNode started up as expected.

13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded
in 0 seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid
9223372036854775805 from
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at
9223372036854775806

I could create a bunch of files and do regular ops (counting to much
after the long max increments). I created over 100 files, just to make
it go well over the Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read
up until at least txid 9223372036854775806 but unable to find any edit
logs containing txid -9223372036854775808

So it looks like it cannot currently handle an overflow.

I've filed https://issues.apache.org/jira/browse/HDFS-4936 to discuss
this. I don't think this is of immediate concern though, so we should
be able to address it in future (unless there's parts of the code
which already are preventing reaching this number in the first place -
please do correct me if there is such a part).

On Tue, Jun 25, 2013 at 3:09 PM, Azuryy Yu <az...@gmail.com> wrote:
> Hi dear All,
>
> It's long type for the txid currently,
>
> FSImage.java:
>
> boolean loadFSImage(FSNamesystem target, MetaRecoveryContext recovery)
>     throws IOException{
>
>   editLog.setNextTxId(lastAppliedTxId + 1L);
> }
>
> Is it possible that (lastAppliedTxId + 1L) exceed Long.MAX_VALUE ?



-- 
Harsh J