You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Chen Huang <pa...@gmail.com> on 2016/05/03 19:14:23 UTC

Synchronization in CatalogOpExecutor

Hi,

I saw we are doing some synchronization in CatalogOpExecutor:
https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java#L337

Can anybody help me understand what the effect of synchronizing on a
method-local variable tbl?
- Doesn't each thread get its own copy of tbl, which is just a reference to
the actual table object and thus end up synchronize with only itself since
no other threads can ever share the same local copy of tbl?
- Isn't the lock on catalog_ already doing the global synchronization? Why
do we need another synchronization on tbl?

Thanks,
Chen

Re: Synchronization in CatalogOpExecutor

Posted by Chen Huang <pa...@gmail.com>.
It's more clear to me now.

Just in case anybody has the same question, here is what confused me:
Assume we have a real table object T in the catalog and 2 threads are
trying to perform concurrent DDL on T. When both threads enter alterTable()
method,
thread1:
  Table tbl = getExistingTable();  // a local variable tbl referencing to T
  synchronized(tbl) { ... }  // this synchronize on T, not the local
variable tbl
thread2:
  Table tbl = getExistingTable();  // another local variable tbl
referencing to T
   synchronized(tbl) { ... }  // this synchronize on T, not the local
variable tbl

So both threads are synchronized against T and that's how we prevent
concurrent DDL on a table.

Thanks for your help, Tim and Dimitris.

Chen


On Tue, May 3, 2016 at 10:51 AM, Dimitris Tsirogiannis <
dtsirogiannis@cloudera.com> wrote:

> Hi Chen,
>
> The per-table lock protects the tables from concurrent
> accesses/modifications. The table references are shared across all the
> threads that execute DDL/DML operations in the catalog. The catalog_ lock
> protects the entire catalog and shouldn't be held for long running
> operations. The catalog_ lock is only acquired when we want to get a new
> catalog version and it is then released to allow for other concurrent
> operations to make progress. What protects the table modifications is the
> synchronized blocks that use the table references for locks.
>
> Hope that helps.
>
> Dimitris
>
> On Tue, May 3, 2016 at 10:27 AM, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > Hi Chen,
> >   Objects in Java are assigned by reference so they should all be
> > referencing the same underlying object (i.e. there aren't separate local
> > copies). I don't think getExistingTable() copies the object.
> >
> > I'm not especially familiar with the catalog's locking scheme, but it
> > doesn't look like it holds the global catalog lock for the whole table
> > operation - it unlocks the catalog lock before making most of the table
> > modifications.
> >
> > - Tim
> >
> > On Tue, May 3, 2016 at 10:14 AM, Chen Huang <pa...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I saw we are doing some synchronization in CatalogOpExecutor:
> > >
> > >
> >
> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java#L337
> > >
> > > Can anybody help me understand what the effect of synchronizing on a
> > > method-local variable tbl?
> > > - Doesn't each thread get its own copy of tbl, which is just a
> reference
> > to
> > > the actual table object and thus end up synchronize with only itself
> > since
> > > no other threads can ever share the same local copy of tbl?
> > > - Isn't the lock on catalog_ already doing the global synchronization?
> > Why
> > > do we need another synchronization on tbl?
> > >
> > > Thanks,
> > > Chen
> > >
> >
>

Re: Synchronization in CatalogOpExecutor

Posted by Dimitris Tsirogiannis <dt...@cloudera.com>.
Hi Chen,

The per-table lock protects the tables from concurrent
accesses/modifications. The table references are shared across all the
threads that execute DDL/DML operations in the catalog. The catalog_ lock
protects the entire catalog and shouldn't be held for long running
operations. The catalog_ lock is only acquired when we want to get a new
catalog version and it is then released to allow for other concurrent
operations to make progress. What protects the table modifications is the
synchronized blocks that use the table references for locks.

Hope that helps.

Dimitris

On Tue, May 3, 2016 at 10:27 AM, Tim Armstrong <ta...@cloudera.com>
wrote:

> Hi Chen,
>   Objects in Java are assigned by reference so they should all be
> referencing the same underlying object (i.e. there aren't separate local
> copies). I don't think getExistingTable() copies the object.
>
> I'm not especially familiar with the catalog's locking scheme, but it
> doesn't look like it holds the global catalog lock for the whole table
> operation - it unlocks the catalog lock before making most of the table
> modifications.
>
> - Tim
>
> On Tue, May 3, 2016 at 10:14 AM, Chen Huang <pa...@gmail.com> wrote:
>
> > Hi,
> >
> > I saw we are doing some synchronization in CatalogOpExecutor:
> >
> >
> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java#L337
> >
> > Can anybody help me understand what the effect of synchronizing on a
> > method-local variable tbl?
> > - Doesn't each thread get its own copy of tbl, which is just a reference
> to
> > the actual table object and thus end up synchronize with only itself
> since
> > no other threads can ever share the same local copy of tbl?
> > - Isn't the lock on catalog_ already doing the global synchronization?
> Why
> > do we need another synchronization on tbl?
> >
> > Thanks,
> > Chen
> >
>

Re: Synchronization in CatalogOpExecutor

Posted by Tim Armstrong <ta...@cloudera.com>.
Hi Chen,
  Objects in Java are assigned by reference so they should all be
referencing the same underlying object (i.e. there aren't separate local
copies). I don't think getExistingTable() copies the object.

I'm not especially familiar with the catalog's locking scheme, but it
doesn't look like it holds the global catalog lock for the whole table
operation - it unlocks the catalog lock before making most of the table
modifications.

- Tim

On Tue, May 3, 2016 at 10:14 AM, Chen Huang <pa...@gmail.com> wrote:

> Hi,
>
> I saw we are doing some synchronization in CatalogOpExecutor:
>
> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/com/cloudera/impala/service/CatalogOpExecutor.java#L337
>
> Can anybody help me understand what the effect of synchronizing on a
> method-local variable tbl?
> - Doesn't each thread get its own copy of tbl, which is just a reference to
> the actual table object and thus end up synchronize with only itself since
> no other threads can ever share the same local copy of tbl?
> - Isn't the lock on catalog_ already doing the global synchronization? Why
> do we need another synchronization on tbl?
>
> Thanks,
> Chen
>