You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@impala.apache.org by yu feng <ol...@gmail.com> on 2017/06/06 12:04:44 UTC

Metadata in cluster is out of sync

Hi impala community:

I having been using impala in our env. Here is our cluster deployment:
20+ impalad backend.
4 of all impalads act as coordinator.
one catalogd and one statestored


I encounter one problem that one impalad's metadata is out of sync after
catalogd restart.I find that while catalogd restarting, a DML operation is
executing.
After I analyze impala source code, I reappear the problem. this is my
steps and analysis:

1. Start the impala cluster.
2. The cluster run a long time, lots of metadata operations, and current
catalogVersion_ is big(such as bigger than 10000)
3. Submit a DML query(such as 'insert into xx partition() select xxx') to
one impalad, and the query run about 1m.
4. While the query running, I stop catalogd, and I start catalogd just
before the query execute QueryExecState->UpdateCatalog().
5. UpdateCatalog() will request catalogd for UpdateCatalog and catalogd
will update the metadata of the table and response the newest metadata of
the table.
6. After catalogd response, UpdateCatalog() update metadata cached in
impalad(call updateCatalogCache()), and the run the following code:

     if (!catalogServiceId_.equals(req.getCatalog_service_id())) {
      boolean firstRun =
catalogServiceId_.equals(INITIAL_CATALOG_SERVICE_ID);
      catalogServiceId_ = req.getCatalog_service_id();
      if (!firstRun) {
        // Throw an exception which will trigger a full topic update
request.
        throw new CatalogException("Detected catalog service ID change.
Aborting " +
            "updateCatalog()");
      }
    }

serviceId is the new started catalogd's serviceId and do not equals to the
impalad's catalogServiceId_, so the function throw CatalogException and the
query get EXCEPTION, what is more, the impalad's catalogServiceId_ is set
to the new one.

7. After catalogd start successfully, and publish all metadata to
statestored, then push to the impalad, After step 6, impalad's
catalogServiceId_ equals to the catalogd's serviceId, no exception throws.

8. In normal steps, step 7 will throw the CatalogException and set the
from_version to 0 and statestored send full metadatas to impalad in next
UpdateState().

9. After all steps finish, the impalad is out of sync, all new metadata
operation will be lost because CatalogObjectCache.add() need 'new item will
only be added if it has a larger catalog version'.

Please help to confirm whether it is correct. If not, Is there any other
possibility of the problem? If so, maybe it is a bug or do you have some
suggestions to avoiding the problem.

Thanks a lot.

Re: Metadata in cluster is out of sync

Posted by yu feng <ol...@gmail.com>.

I said "new metadata will be lost"  means the following metadata operation
which happened to the existing table will be lost until the table's version
catch up with the older version. I think any operation can not recover it
because impalad update local cached metadata by comparing new version and
older version.

I try some operations which can trigger table's metadata reloading and new
version generated such as refresh, alter table. But new metadata always
lost until the catalog_version bigger than the older one, for new created
catalog object(such as create table/ create database..), the metadata
is up-to-date.

I think it is a bug, we need keep older catalogServiceId_ until full newly
metadata applied（non-delta one, pushed by statestored）， even all
of metadata operations will be EXCEPTION at this time gap. perhaps there
are some better solutions.

Thanks a lot.

2017-06-07 0:58 GMT+08:00 Dimitris Tsirogiannis <dt...@cloudera.com>
:

> Hi,
>
> It could be that there is a looming bug here. Can you clarify what "new
> metadata will be lost" means? I suspect that in most cases you can recover
> by running either refresh (if only files were added) or recover partitions
> (if a new partition was dynamically created).
>
> Dimitris
>
> On Tue, Jun 6, 2017 at 5:04 AM, yu feng <ol...@gmail.com> wrote:
>
> > Hi impala community:
> >
> > I having been using impala in our env. Here is our cluster deployment:
> > 20+ impalad backend.
> > 4 of all impalads act as coordinator.
> > one catalogd and one statestored
> >
> >
> > I encounter one problem that one impalad's metadata is out of sync after
> > catalogd restart.I find that while catalogd restarting, a DML operation
> is
> > executing.
> > After I analyze impala source code, I reappear the problem. this is my
> > steps and analysis:
> >
> > 1. Start the impala cluster.
> > 2. The cluster run a long time, lots of metadata operations, and current
> > catalogVersion_ is big(such as bigger than 10000)
> > 3. Submit a DML query(such as 'insert into xx partition() select xxx') to
> > one impalad, and the query run about 1m.
> > 4. While the query running, I stop catalogd, and I start catalogd just
> > before the query execute QueryExecState->UpdateCatalog().
> > 5. UpdateCatalog() will request catalogd for UpdateCatalog and catalogd
> > will update the metadata of the table and response the newest metadata of
> > the table.
> > 6. After catalogd response, UpdateCatalog() update metadata cached in
> > impalad(call updateCatalogCache()), and the run the following code:
> >
> >      if (!catalogServiceId_.equals(req.getCatalog_service_id())) {
> >       boolean firstRun =
> > catalogServiceId_.equals(INITIAL_CATALOG_SERVICE_ID);
> >       catalogServiceId_ = req.getCatalog_service_id();
> >       if (!firstRun) {
> >         // Throw an exception which will trigger a full topic update
> > request.
> >         throw new CatalogException("Detected catalog service ID change.
> > Aborting " +
> >             "updateCatalog()");
> >       }
> >     }
> >
> > serviceId is the new started catalogd's serviceId and do not equals to
> the
> > impalad's catalogServiceId_, so the function throw CatalogException and
> the
> > query get EXCEPTION, what is more, the impalad's catalogServiceId_ is set
> > to the new one.
> >
> > 7. After catalogd start successfully, and publish all metadata to
> > statestored, then push to the impalad, After step 6, impalad's
> > catalogServiceId_ equals to the catalogd's serviceId, no exception
> throws.
> >
> > 8. In normal steps, step 7 will throw the CatalogException and set the
> > from_version to 0 and statestored send full metadatas to impalad in next
> > UpdateState().
> >
> > 9. After all steps finish, the impalad is out of sync, all new metadata
> > operation will be lost because CatalogObjectCache.add() need 'new item
> will
> > only be added if it has a larger catalog version'.
> >
> > Please help to confirm whether it is correct. If not, Is there any other
> > possibility of the problem? If so, maybe it is a bug or do you have some
> > suggestions to avoiding the problem.
> >
> > Thanks a lot.
> >
>

Re: Metadata in cluster is out of sync

Posted by Dimitris Tsirogiannis <dt...@cloudera.com>.

Hi,

It could be that there is a looming bug here. Can you clarify what "new
metadata will be lost" means? I suspect that in most cases you can recover
by running either refresh (if only files were added) or recover partitions
(if a new partition was dynamically created).

Dimitris

On Tue, Jun 6, 2017 at 5:04 AM, yu feng <ol...@gmail.com> wrote:

> Hi impala community:
>
> I having been using impala in our env. Here is our cluster deployment:
> 20+ impalad backend.
> 4 of all impalads act as coordinator.
> one catalogd and one statestored
>
>
> I encounter one problem that one impalad's metadata is out of sync after
> catalogd restart.I find that while catalogd restarting, a DML operation is
> executing.
> After I analyze impala source code, I reappear the problem. this is my
> steps and analysis:
>
> 1. Start the impala cluster.
> 2. The cluster run a long time, lots of metadata operations, and current
> catalogVersion_ is big(such as bigger than 10000)
> 3. Submit a DML query(such as 'insert into xx partition() select xxx') to
> one impalad, and the query run about 1m.
> 4. While the query running, I stop catalogd, and I start catalogd just
> before the query execute QueryExecState->UpdateCatalog().
> 5. UpdateCatalog() will request catalogd for UpdateCatalog and catalogd
> will update the metadata of the table and response the newest metadata of
> the table.
> 6. After catalogd response, UpdateCatalog() update metadata cached in
> impalad(call updateCatalogCache()), and the run the following code:
>
>      if (!catalogServiceId_.equals(req.getCatalog_service_id())) {
>       boolean firstRun =
> catalogServiceId_.equals(INITIAL_CATALOG_SERVICE_ID);
>       catalogServiceId_ = req.getCatalog_service_id();
>       if (!firstRun) {
>         // Throw an exception which will trigger a full topic update
> request.
>         throw new CatalogException("Detected catalog service ID change.
> Aborting " +
>             "updateCatalog()");
>       }
>     }
>
> serviceId is the new started catalogd's serviceId and do not equals to the
> impalad's catalogServiceId_, so the function throw CatalogException and the
> query get EXCEPTION, what is more, the impalad's catalogServiceId_ is set
> to the new one.
>
> 7. After catalogd start successfully, and publish all metadata to
> statestored, then push to the impalad, After step 6, impalad's
> catalogServiceId_ equals to the catalogd's serviceId, no exception throws.
>
> 8. In normal steps, step 7 will throw the CatalogException and set the
> from_version to 0 and statestored send full metadatas to impalad in next
> UpdateState().
>
> 9. After all steps finish, the impalad is out of sync, all new metadata
> operation will be lost because CatalogObjectCache.add() need 'new item will
> only be added if it has a larger catalog version'.
>
> Please help to confirm whether it is correct. If not, Is there any other
> possibility of the problem? If so, maybe it is a bug or do you have some
> suggestions to avoiding the problem.
>
> Thanks a lot.
>