You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Sergey Soldatov <se...@gmail.com> on 2018/01/02 23:58:05 UTC

Assignment Manager and clock advance.

Hi,
Not sure whether we may consider that as a bug, but I found an interesting
dependency of AM2 on clock advancing. A simple operation such as create
table is unable to perform with the same current_time value:

public void testCreateTable() throws IOException {
  EnvironmentEdgeManager.injectEdge(new EnvironmentEdge() {
    volatile int curTime = 1000;

    @Override
    public long currentTime() {

      return curTime;
    }
  });
  final TableName tableName = TableName.valueOf("test");
  TEST_UTIL.createTable(tableName, HConstants.CATALOG_FAMILY).close();
}

and fails with a TableNotFound exception. The reason is that between
transitions we get table information from meta using Get with the exclusive
current timestamp. Could it be a potential problem (i.e. the system capable
to execute all that transition stuff in less than 1 ms)?

Thanks,
Sergey

Re: Assignment Manager and clock advance.

Posted by Apekshit Sharma <ap...@cloudera.com>.

Yeah, you're right.
Unreliable clocks open a whole set of different issues which i didn't want
to go into for the question "the system capable to execute all that
transition stuff in less than 1 ms?". I thought that question was genuinely
asking how probable was the fabricated scenario  i.e. if everything can
actually execute within 1ms realtime.

But talking about unreliable clocks, it's not just the problem of
non-incrementing, right? clock can go backwards also.
Sadly, our current EnvironmentEdgeManager isn't capable of handling such
cases and the HLC (HBASE-14070) effort to fix that hasn't seen much
progress lately :(
So yes, if clocks go bad, those bugs can happen even if operation is
spanning over seconds in realtime.

-- Appy


On Tue, Jan 2, 2018 at 5:01 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> I don't think these assumptions are reliable. I've seen cases where
> subsequent calls to currentTimeMillis() are non-incrementing on specific
> Linux distributions. Taken in aggregate, the system clock makes progress,
> but those aggregations are on the multi-second scale.
>
> On Tue, Jan 2, 2018 at 4:37 PM Apekshit Sharma <ap...@cloudera.com> wrote:
>
> > Hi Sergey,
> >
> > Interesting test and find. Makes total sense too.
> > However, in real world case, any put in meta table itself will take more
> > than a ms, and then we have lot of Procedure framework and other logic in
> > between meta accesses which would make this scenarios impossible.
> > Specifically, there a lot of processing in between adding rows from
> > CreateTableProcedure and trying to access it in children
> > AssignProcedure(s).
> >
> > -- Apy
> >
> > On Tue, Jan 2, 2018 at 3:58 PM, Sergey Soldatov <
> sergeysoldatov@gmail.com>
> > wrote:
> >
> > > Hi,
> > > Not sure whether we may consider that as a bug, but I found an
> > interesting
> > > dependency of AM2 on clock advancing. A simple operation such as create
> > > table is unable to perform with the same current_time value:
> > >
> > > public void testCreateTable() throws IOException {
> > >   EnvironmentEdgeManager.injectEdge(new EnvironmentEdge() {
> > >     volatile int curTime = 1000;
> > >
> > >     @Override
> > >     public long currentTime() {
> > >
> > >       return curTime;
> > >     }
> > >   });
> > >   final TableName tableName = TableName.valueOf("test");
> > >   TEST_UTIL.createTable(tableName, HConstants.CATALOG_FAMILY).close();
> > > }
> > >
> > > and fails with a TableNotFound exception. The reason is that between
> > > transitions we get table information from meta using Get with the
> > exclusive
> > > current timestamp. Could it be a potential problem (i.e. the system
> > capable
> > > to execute all that transition stuff in less than 1 ms)?
> > >
> > > Thanks,
> > > Sergey
> > >
> >
> >
> >
> > --
> >
> > -- Appy
> >
>



-- 

-- Appy

Re: Assignment Manager and clock advance.

Posted by Nick Dimiduk <nd...@gmail.com>.

I don't think these assumptions are reliable. I've seen cases where
subsequent calls to currentTimeMillis() are non-incrementing on specific
Linux distributions. Taken in aggregate, the system clock makes progress,
but those aggregations are on the multi-second scale.

On Tue, Jan 2, 2018 at 4:37 PM Apekshit Sharma <ap...@cloudera.com> wrote:

> Hi Sergey,
>
> Interesting test and find. Makes total sense too.
> However, in real world case, any put in meta table itself will take more
> than a ms, and then we have lot of Procedure framework and other logic in
> between meta accesses which would make this scenarios impossible.
> Specifically, there a lot of processing in between adding rows from
> CreateTableProcedure and trying to access it in children
> AssignProcedure(s).
>
> -- Apy
>
> On Tue, Jan 2, 2018 at 3:58 PM, Sergey Soldatov <se...@gmail.com>
> wrote:
>
> > Hi,
> > Not sure whether we may consider that as a bug, but I found an
> interesting
> > dependency of AM2 on clock advancing. A simple operation such as create
> > table is unable to perform with the same current_time value:
> >
> > public void testCreateTable() throws IOException {
> >   EnvironmentEdgeManager.injectEdge(new EnvironmentEdge() {
> >     volatile int curTime = 1000;
> >
> >     @Override
> >     public long currentTime() {
> >
> >       return curTime;
> >     }
> >   });
> >   final TableName tableName = TableName.valueOf("test");
> >   TEST_UTIL.createTable(tableName, HConstants.CATALOG_FAMILY).close();
> > }
> >
> > and fails with a TableNotFound exception. The reason is that between
> > transitions we get table information from meta using Get with the
> exclusive
> > current timestamp. Could it be a potential problem (i.e. the system
> capable
> > to execute all that transition stuff in less than 1 ms)?
> >
> > Thanks,
> > Sergey
> >
>
>
>
> --
>
> -- Appy
>

Re: Assignment Manager and clock advance.

Posted by Apekshit Sharma <ap...@cloudera.com>.

Hi Sergey,

Interesting test and find. Makes total sense too.
However, in real world case, any put in meta table itself will take more
than a ms, and then we have lot of Procedure framework and other logic in
between meta accesses which would make this scenarios impossible.
Specifically, there a lot of processing in between adding rows from
CreateTableProcedure and trying to access it in children AssignProcedure(s).

-- Apy

On Tue, Jan 2, 2018 at 3:58 PM, Sergey Soldatov <se...@gmail.com>
wrote:

> Hi,
> Not sure whether we may consider that as a bug, but I found an interesting
> dependency of AM2 on clock advancing. A simple operation such as create
> table is unable to perform with the same current_time value:
>
> public void testCreateTable() throws IOException {
>   EnvironmentEdgeManager.injectEdge(new EnvironmentEdge() {
>     volatile int curTime = 1000;
>
>     @Override
>     public long currentTime() {
>
>       return curTime;
>     }
>   });
>   final TableName tableName = TableName.valueOf("test");
>   TEST_UTIL.createTable(tableName, HConstants.CATALOG_FAMILY).close();
> }
>
> and fails with a TableNotFound exception. The reason is that between
> transitions we get table information from meta using Get with the exclusive
> current timestamp. Could it be a potential problem (i.e. the system capable
> to execute all that transition stuff in less than 1 ms)?
>
> Thanks,
> Sergey
>

-- 

-- Appy

Re: Assignment Manager and clock advance.

Posted by Sergey Soldatov <se...@gmail.com>.

Thank you all for the comments, I got the point. Raised that issue because
in some of our tests we are using manual advance for the clock and that was
working just fine in 1.x releases (so create table operation was atomic
from this perspective). And that stops working with AMv2. Of course, we can
deal with it by using auto incremental edge, but my main concern was that
for a simple operation as create table we may run in the situation where
negative clock adjustment happen  by ntpd/whatever because of time drift
which may (and usually) happen in a hypervisor environment (HyperV, KVM,
VMWare - they all have that issue under a heavy load). But it seems that
the master will retry the subprocedure and eventually it will be completed
(tried the same test with slow updated clock). I will raise the jira as
nice to have.

Thanks,
Sergey

On Wed, Jan 3, 2018 at 8:00 AM, stack <sa...@gmail.com> wrote:

> Stalling environmentedge as is done here does not work. In various areas in
> internals we expect the clock to advance.  This is not particular to AMv2.
>
> As per Appy, we need HLC generally (it is almost there but needs some
> concentrated effort to carry it the last few yards).
>
> For AMv2, we have a single actor--the Master-- so we should be able to put
> up simple checks that we have an advancing clock.
>
> Please make an issue Sergey and we'll have a go at it.  Thanks for raising
> this issue.
>
> S
>
> On Jan 2, 2018 5:58 PM, "Sergey Soldatov" <se...@gmail.com>
> wrote:
>
> > Hi,
> > Not sure whether we may consider that as a bug, but I found an
> interesting
> > dependency of AM2 on clock advancing. A simple operation such as create
> > table is unable to perform with the same current_time value:
> >
> > public void testCreateTable() throws IOException {
> >   EnvironmentEdgeManager.injectEdge(new EnvironmentEdge() {
> >     volatile int curTime = 1000;
> >
> >     @Override
> >     public long currentTime() {
> >
> >       return curTime;
> >     }
> >   });
> >   final TableName tableName = TableName.valueOf("test");
> >   TEST_UTIL.createTable(tableName, HConstants.CATALOG_FAMILY).close();
> > }
> >
> > and fails with a TableNotFound exception. The reason is that between
> > transitions we get table information from meta using Get with the
> exclusive
> > current timestamp. Could it be a potential problem (i.e. the system
> capable
> > to execute all that transition stuff in less than 1 ms)?
> >
> > Thanks,
> > Sergey
> >
>

Re: Assignment Manager and clock advance.

Posted by stack <sa...@gmail.com>.

Stalling environmentedge as is done here does not work. In various areas in
internals we expect the clock to advance.  This is not particular to AMv2.

As per Appy, we need HLC generally (it is almost there but needs some
concentrated effort to carry it the last few yards).

For AMv2, we have a single actor--the Master-- so we should be able to put
up simple checks that we have an advancing clock.

Please make an issue Sergey and we'll have a go at it.  Thanks for raising
this issue.

S

On Jan 2, 2018 5:58 PM, "Sergey Soldatov" <se...@gmail.com> wrote:

> Hi,
> Not sure whether we may consider that as a bug, but I found an interesting
> dependency of AM2 on clock advancing. A simple operation such as create
> table is unable to perform with the same current_time value:
>
> public void testCreateTable() throws IOException {
>   EnvironmentEdgeManager.injectEdge(new EnvironmentEdge() {
>     volatile int curTime = 1000;
>
>     @Override
>     public long currentTime() {
>
>       return curTime;
>     }
>   });
>   final TableName tableName = TableName.valueOf("test");
>   TEST_UTIL.createTable(tableName, HConstants.CATALOG_FAMILY).close();
> }
>
> and fails with a TableNotFound exception. The reason is that between
> transitions we get table information from meta using Get with the exclusive
> current timestamp. Could it be a potential problem (i.e. the system capable
> to execute all that transition stuff in less than 1 ms)?
>
> Thanks,
> Sergey
>