You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Piyush Narang <p....@criteo.com> on 2018/01/03 18:28:40 UTC

Issues running compute incremental stats in Impala - alter not possible

Hi folks,

I’m running into some issues when I try to compute incremental stats in Impala that I was hoping someone would be able to help with. I’m able to ‘compute stats’ in Impala on my smaller tables just fine. When I try computing stats incrementally for one of my larger tables, I seem to be running into this error:
> compute incremental stats bi_ full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU');
Query: compute incremental stats bi_full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU')
WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: alter is not possible

Looking at impalad.INFO and catalogd.INFO I don’t see any additional details. I verified that I’m the owner of the tables in HDFS.

Has anyone run into this issue in the past? Any workarounds?

Thanks,

-- Piyush


Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Mostafa Mokhtar <mm...@cloudera.com>.
Hi Piyush,

If you have rough understanding of NDV and row count you can set them
manually via alter table command below:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_perf_stats.html#perf_column_stats_manual

There are efforts to add sampled statistics support to Impala which should
help in your case.

Thanks
Mostafa


On Thu, Jan 4, 2018 at 10:26 AM, Piyush Narang <p....@criteo.com> wrote:

> Seems like my attempt to compute full stats for this table failed as well.
> Like Mostafa pointed out the bulk of the overhead was indeed in the select
> ndv(c1), … query. The query ends up spending over 5 hours there.
> Unfortunately, it seems to fail on the Hive metastore update. Digging into
> that with some folks on our end.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Piyush Narang <p....@criteo.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 2:49 PM
>
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> Thanks for pointing this out. Kicked off a run of this. Shall get back
> with breakdowns and how it goes.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mm...@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 2:13 PM
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> When you looked at the Web-UI while compute stats was running what did you
> find?
>
>
>
> In general compute stats has 3 phases
>
>    1. Get row count per partition using something like select count(*),
>    partition_id from foo group by partition_id
>    2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size
>    (CN), Avg size (CN) from foo
>    3. Persist the captured stats in Hive Meta store
>
>
>
> For steps #1 and #2 you should be able to get a good read on progress
> using the "Scan Progress" column.
>
> Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the
> progress there.
>
>
>
> And if the table is large in terms of on disk size I expect steps #2 to
> dominate the time.
>
>
>
> If the operation is CPU not IO bound increasing mt_dop should give you
> good speedup, yet I don't recommend a value greater than 16.
>
>
>
>
>
> On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <p....@criteo.com>
> wrote:
>
> Thanks Alex and Mostafa. I tried running compute stats full a couple of
> weeks back on this table and it was still going 4 hours later (and I didn’t
> see any progress indication on the Impala web UI). I’ll try and hunt
> through the Hive metastore log files and see if I can find anything.
>
>
>
> Is there something specific you’re looking for in the show create table
> output? I can dump that here (rather than the full table definition and
> details which is pretty verbose and I might need to check if it’s ok to
> share externally).
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mm...@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 1:48 PM
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> Also check the Hive Metastore log files.
>
>
>
> In general if the table has a large number of partitions incremental stats
> will have very large overhead in terms of metadata.
>
>
>
> I would recommend running "compute stats bi_ full" then manually set the
> row count for newly added partitions whenever possible.
>
>
>
> On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>
> wrote:
>
> Thanks for the report. I have not seen this issue. Looks like the alter
> RPC is rejected by the Hive Metastore. Maybe looking into the
> Hive/Metastore logs would help.
>
>
>
> The SHOW CREATE TABLE output might also help us debug.
>
>
>
> On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>
> wrote:
>
> Hi folks,
>
>
>
> I’m running into some issues when I try to compute incremental stats in
> Impala that I was hoping someone would be able to help with. I’m able to
> ‘compute stats’ in Impala on my smaller tables just fine. When I try
> computing stats incrementally for one of my larger tables, I seem to be
> running into this error:
>
> > compute incremental stats bi_ full partition (param1=0,day='2017-10-04',
> hour=00,host_platform='EU');
>
> Query: compute incremental stats bi_full partition
> (param1=0,day='2017-10-04',hour=00,host_platform='EU')
>
> WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to
> Hive Metastore:
>
> CAUSED BY: InvalidOperationException: alter is not possible
>
>
>
> Looking at impalad.INFO and catalogd.INFO I don’t see any additional
> details. I verified that I’m the owner of the tables in HDFS.
>
>
>
> Has anyone run into this issue in the past? Any workarounds?
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
>
>
>
>

Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Alexander Behm <al...@cloudera.com>.
Thanks for the update. Please let us know if you find out what happened on
the Hive side. We might be able to help.

On Thu, Jan 4, 2018 at 10:26 AM, Piyush Narang <p....@criteo.com> wrote:

> Seems like my attempt to compute full stats for this table failed as well.
> Like Mostafa pointed out the bulk of the overhead was indeed in the select
> ndv(c1), … query. The query ends up spending over 5 hours there.
> Unfortunately, it seems to fail on the Hive metastore update. Digging into
> that with some folks on our end.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Piyush Narang <p....@criteo.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 2:49 PM
>
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> Thanks for pointing this out. Kicked off a run of this. Shall get back
> with breakdowns and how it goes.
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mm...@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 2:13 PM
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> When you looked at the Web-UI while compute stats was running what did you
> find?
>
>
>
> In general compute stats has 3 phases
>
>    1. Get row count per partition using something like select count(*),
>    partition_id from foo group by partition_id
>    2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size
>    (CN), Avg size (CN) from foo
>    3. Persist the captured stats in Hive Meta store
>
>
>
> For steps #1 and #2 you should be able to get a good read on progress
> using the "Scan Progress" column.
>
> Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the
> progress there.
>
>
>
> And if the table is large in terms of on disk size I expect steps #2 to
> dominate the time.
>
>
>
> If the operation is CPU not IO bound increasing mt_dop should give you
> good speedup, yet I don't recommend a value greater than 16.
>
>
>
>
>
> On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <p....@criteo.com>
> wrote:
>
> Thanks Alex and Mostafa. I tried running compute stats full a couple of
> weeks back on this table and it was still going 4 hours later (and I didn’t
> see any progress indication on the Impala web UI). I’ll try and hunt
> through the Hive metastore log files and see if I can find anything.
>
>
>
> Is there something specific you’re looking for in the show create table
> output? I can dump that here (rather than the full table definition and
> details which is pretty verbose and I might need to check if it’s ok to
> share externally).
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mm...@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 1:48 PM
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> Also check the Hive Metastore log files.
>
>
>
> In general if the table has a large number of partitions incremental stats
> will have very large overhead in terms of metadata.
>
>
>
> I would recommend running "compute stats bi_ full" then manually set the
> row count for newly added partitions whenever possible.
>
>
>
> On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>
> wrote:
>
> Thanks for the report. I have not seen this issue. Looks like the alter
> RPC is rejected by the Hive Metastore. Maybe looking into the
> Hive/Metastore logs would help.
>
>
>
> The SHOW CREATE TABLE output might also help us debug.
>
>
>
> On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>
> wrote:
>
> Hi folks,
>
>
>
> I’m running into some issues when I try to compute incremental stats in
> Impala that I was hoping someone would be able to help with. I’m able to
> ‘compute stats’ in Impala on my smaller tables just fine. When I try
> computing stats incrementally for one of my larger tables, I seem to be
> running into this error:
>
> > compute incremental stats bi_ full partition (param1=0,day='2017-10-04',
> hour=00,host_platform='EU');
>
> Query: compute incremental stats bi_full partition
> (param1=0,day='2017-10-04',hour=00,host_platform='EU')
>
> WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to
> Hive Metastore:
>
> CAUSED BY: InvalidOperationException: alter is not possible
>
>
>
> Looking at impalad.INFO and catalogd.INFO I don’t see any additional
> details. I verified that I’m the owner of the tables in HDFS.
>
>
>
> Has anyone run into this issue in the past? Any workarounds?
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
>
>
>
>

Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Piyush Narang <p....@criteo.com>.
Seems like my attempt to compute full stats for this table failed as well. Like Mostafa pointed out the bulk of the overhead was indeed in the select ndv(c1), … query. The query ends up spending over 5 hours there. Unfortunately, it seems to fail on the Hive metastore update. Digging into that with some folks on our end.

-- Piyush


From: Piyush Narang <p....@criteo.com>
Reply-To: "user@impala.apache.org" <us...@impala.apache.org>
Date: Wednesday, January 3, 2018 at 2:49 PM
To: "user@impala.apache.org" <us...@impala.apache.org>
Subject: Re: Issues running compute incremental stats in Impala - alter not possible

Thanks for pointing this out. Kicked off a run of this. Shall get back with breakdowns and how it goes.

-- Piyush


From: Mostafa Mokhtar <mm...@cloudera.com>
Reply-To: "user@impala.apache.org" <us...@impala.apache.org>
Date: Wednesday, January 3, 2018 at 2:13 PM
To: "user@impala.apache.org" <us...@impala.apache.org>
Subject: Re: Issues running compute incremental stats in Impala - alter not possible

When you looked at the Web-UI while compute stats was running what did you find?

In general compute stats has 3 phases

  1.  Get row count per partition using something like select count(*), partition_id from foo group by partition_id
  2.  SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size (CN), Avg size (CN) from foo
  3.  Persist the captured stats in Hive Meta store

For steps #1 and #2 you should be able to get a good read on progress using the "Scan Progress" column.
Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the progress there.

And if the table is large in terms of on disk size I expect steps #2 to dominate the time.

If the operation is CPU not IO bound increasing mt_dop should give you good speedup, yet I don't recommend a value greater than 16.


On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <p....@criteo.com>> wrote:
Thanks Alex and Mostafa. I tried running compute stats full a couple of weeks back on this table and it was still going 4 hours later (and I didn’t see any progress indication on the Impala web UI). I’ll try and hunt through the Hive metastore log files and see if I can find anything.

Is there something specific you’re looking for in the show create table output? I can dump that here (rather than the full table definition and details which is pretty verbose and I might need to check if it’s ok to share externally).

-- Piyush


From: Mostafa Mokhtar <mm...@cloudera.com>>
Reply-To: "user@impala.apache.org<ma...@impala.apache.org>" <us...@impala.apache.org>>
Date: Wednesday, January 3, 2018 at 1:48 PM
To: "user@impala.apache.org<ma...@impala.apache.org>" <us...@impala.apache.org>>
Subject: Re: Issues running compute incremental stats in Impala - alter not possible

Also check the Hive Metastore log files.

In general if the table has a large number of partitions incremental stats will have very large overhead in terms of metadata.

I would recommend running "compute stats bi_ full" then manually set the row count for newly added partitions whenever possible.

On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>> wrote:
Thanks for the report. I have not seen this issue. Looks like the alter RPC is rejected by the Hive Metastore. Maybe looking into the Hive/Metastore logs would help.

The SHOW CREATE TABLE output might also help us debug.

On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>> wrote:
Hi folks,

I’m running into some issues when I try to compute incremental stats in Impala that I was hoping someone would be able to help with. I’m able to ‘compute stats’ in Impala on my smaller tables just fine. When I try computing stats incrementally for one of my larger tables, I seem to be running into this error:
> compute incremental stats bi_ full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU');
Query: compute incremental stats bi_full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU')
WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: alter is not possible

Looking at impalad.INFO and catalogd.INFO I don’t see any additional details. I verified that I’m the owner of the tables in HDFS.

Has anyone run into this issue in the past? Any workarounds?

Thanks,

-- Piyush





Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Piyush Narang <p....@criteo.com>.
Thanks for pointing this out. Kicked off a run of this. Shall get back with breakdowns and how it goes.

-- Piyush


From: Mostafa Mokhtar <mm...@cloudera.com>
Reply-To: "user@impala.apache.org" <us...@impala.apache.org>
Date: Wednesday, January 3, 2018 at 2:13 PM
To: "user@impala.apache.org" <us...@impala.apache.org>
Subject: Re: Issues running compute incremental stats in Impala - alter not possible

When you looked at the Web-UI while compute stats was running what did you find?

In general compute stats has 3 phases

  1.  Get row count per partition using something like select count(*), partition_id from foo group by partition_id
  2.  SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size (CN), Avg size (CN) from foo
  3.  Persist the captured stats in Hive Meta store

For steps #1 and #2 you should be able to get a good read on progress using the "Scan Progress" column.
Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the progress there.

And if the table is large in terms of on disk size I expect steps #2 to dominate the time.

If the operation is CPU not IO bound increasing mt_dop should give you good speedup, yet I don't recommend a value greater than 16.


On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <p....@criteo.com>> wrote:
Thanks Alex and Mostafa. I tried running compute stats full a couple of weeks back on this table and it was still going 4 hours later (and I didn’t see any progress indication on the Impala web UI). I’ll try and hunt through the Hive metastore log files and see if I can find anything.

Is there something specific you’re looking for in the show create table output? I can dump that here (rather than the full table definition and details which is pretty verbose and I might need to check if it’s ok to share externally).

-- Piyush


From: Mostafa Mokhtar <mm...@cloudera.com>>
Reply-To: "user@impala.apache.org<ma...@impala.apache.org>" <us...@impala.apache.org>>
Date: Wednesday, January 3, 2018 at 1:48 PM
To: "user@impala.apache.org<ma...@impala.apache.org>" <us...@impala.apache.org>>
Subject: Re: Issues running compute incremental stats in Impala - alter not possible

Also check the Hive Metastore log files.

In general if the table has a large number of partitions incremental stats will have very large overhead in terms of metadata.

I would recommend running "compute stats bi_ full" then manually set the row count for newly added partitions whenever possible.

On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>> wrote:
Thanks for the report. I have not seen this issue. Looks like the alter RPC is rejected by the Hive Metastore. Maybe looking into the Hive/Metastore logs would help.

The SHOW CREATE TABLE output might also help us debug.

On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>> wrote:
Hi folks,

I’m running into some issues when I try to compute incremental stats in Impala that I was hoping someone would be able to help with. I’m able to ‘compute stats’ in Impala on my smaller tables just fine. When I try computing stats incrementally for one of my larger tables, I seem to be running into this error:
> compute incremental stats bi_ full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU');
Query: compute incremental stats bi_full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU')
WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: alter is not possible

Looking at impalad.INFO and catalogd.INFO I don’t see any additional details. I verified that I’m the owner of the tables in HDFS.

Has anyone run into this issue in the past? Any workarounds?

Thanks,

-- Piyush





Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Mostafa Mokhtar <mm...@cloudera.com>.
When you looked at the Web-UI while compute stats was running what did you
find?

In general compute stats has 3 phases

   1. Get row count per partition using something like select count(*),
   partition_id from foo group by partition_id
   2. SELECT NDV(C1), Max size (C1), Avg size (C1), ... NDV(CN), Max size
   (CN), Avg size (CN) from foo
   3. Persist the captured stats in Hive Meta store


For steps #1 and #2 you should be able to get a good read on progress using
the "Scan Progress" column.
Step 3 if you tail /var/log/catalogd/catalogd.INFO you should see the
progress there.

And if the table is large in terms of on disk size I expect steps #2 to
dominate the time.

If the operation is CPU not IO bound increasing mt_dop should give you good
speedup, yet I don't recommend a value greater than 16.


On Wed, Jan 3, 2018 at 11:03 AM, Piyush Narang <p....@criteo.com> wrote:

> Thanks Alex and Mostafa. I tried running compute stats full a couple of
> weeks back on this table and it was still going 4 hours later (and I didn’t
> see any progress indication on the Impala web UI). I’ll try and hunt
> through the Hive metastore log files and see if I can find anything.
>
>
>
> Is there something specific you’re looking for in the show create table
> output? I can dump that here (rather than the full table definition and
> details which is pretty verbose and I might need to check if it’s ok to
> share externally).
>
>
>
> -- Piyush
>
>
>
>
>
> *From: *Mostafa Mokhtar <mm...@cloudera.com>
> *Reply-To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Date: *Wednesday, January 3, 2018 at 1:48 PM
> *To: *"user@impala.apache.org" <us...@impala.apache.org>
> *Subject: *Re: Issues running compute incremental stats in Impala - alter
> not possible
>
>
>
> Also check the Hive Metastore log files.
>
>
>
> In general if the table has a large number of partitions incremental stats
> will have very large overhead in terms of metadata.
>
>
>
> I would recommend running "compute stats bi_ full" then manually set the
> row count for newly added partitions whenever possible.
>
>
>
> On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>
> wrote:
>
> Thanks for the report. I have not seen this issue. Looks like the alter
> RPC is rejected by the Hive Metastore. Maybe looking into the
> Hive/Metastore logs would help.
>
>
>
> The SHOW CREATE TABLE output might also help us debug.
>
>
>
> On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>
> wrote:
>
> Hi folks,
>
>
>
> I’m running into some issues when I try to compute incremental stats in
> Impala that I was hoping someone would be able to help with. I’m able to
> ‘compute stats’ in Impala on my smaller tables just fine. When I try
> computing stats incrementally for one of my larger tables, I seem to be
> running into this error:
>
> > compute incremental stats bi_ full partition (param1=0,day='2017-10-04',
> hour=00,host_platform='EU');
>
> Query: compute incremental stats bi_full partition
> (param1=0,day='2017-10-04',hour=00,host_platform='EU')
>
> WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to
> Hive Metastore:
>
> CAUSED BY: InvalidOperationException: alter is not possible
>
>
>
> Looking at impalad.INFO and catalogd.INFO I don’t see any additional
> details. I verified that I’m the owner of the tables in HDFS.
>
>
>
> Has anyone run into this issue in the past? Any workarounds?
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>
>
>
>
>

Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Piyush Narang <p....@criteo.com>.
Thanks Alex and Mostafa. I tried running compute stats full a couple of weeks back on this table and it was still going 4 hours later (and I didn’t see any progress indication on the Impala web UI). I’ll try and hunt through the Hive metastore log files and see if I can find anything.

Is there something specific you’re looking for in the show create table output? I can dump that here (rather than the full table definition and details which is pretty verbose and I might need to check if it’s ok to share externally).

-- Piyush


From: Mostafa Mokhtar <mm...@cloudera.com>
Reply-To: "user@impala.apache.org" <us...@impala.apache.org>
Date: Wednesday, January 3, 2018 at 1:48 PM
To: "user@impala.apache.org" <us...@impala.apache.org>
Subject: Re: Issues running compute incremental stats in Impala - alter not possible

Also check the Hive Metastore log files.

In general if the table has a large number of partitions incremental stats will have very large overhead in terms of metadata.

I would recommend running "compute stats bi_ full" then manually set the row count for newly added partitions whenever possible.

On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>> wrote:
Thanks for the report. I have not seen this issue. Looks like the alter RPC is rejected by the Hive Metastore. Maybe looking into the Hive/Metastore logs would help.

The SHOW CREATE TABLE output might also help us debug.

On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>> wrote:
Hi folks,

I’m running into some issues when I try to compute incremental stats in Impala that I was hoping someone would be able to help with. I’m able to ‘compute stats’ in Impala on my smaller tables just fine. When I try computing stats incrementally for one of my larger tables, I seem to be running into this error:
> compute incremental stats bi_ full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU');
Query: compute incremental stats bi_full partition (param1=0,day='2017-10-04',hour=00,host_platform='EU')
WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive Metastore:
CAUSED BY: InvalidOperationException: alter is not possible

Looking at impalad.INFO and catalogd.INFO I don’t see any additional details. I verified that I’m the owner of the tables in HDFS.

Has anyone run into this issue in the past? Any workarounds?

Thanks,

-- Piyush




Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Mostafa Mokhtar <mm...@cloudera.com>.
Also check the Hive Metastore log files.

In general if the table has a large number of partitions incremental stats
will have very large overhead in terms of metadata.

I would recommend running "compute stats bi_ full" then manually set the
row count for newly added partitions whenever possible.

On Wed, Jan 3, 2018 at 10:36 AM, Alexander Behm <al...@cloudera.com>
wrote:

> Thanks for the report. I have not seen this issue. Looks like the alter
> RPC is rejected by the Hive Metastore. Maybe looking into the
> Hive/Metastore logs would help.
>
> The SHOW CREATE TABLE output might also help us debug.
>
> On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com>
> wrote:
>
>> Hi folks,
>>
>>
>>
>> I’m running into some issues when I try to compute incremental stats in
>> Impala that I was hoping someone would be able to help with. I’m able to
>> ‘compute stats’ in Impala on my smaller tables just fine. When I try
>> computing stats incrementally for one of my larger tables, I seem to be
>> running into this error:
>>
>> > compute incremental stats bi_ full partition
>> (param1=0,day='2017-10-04',hour=00,host_platform='EU');
>>
>> Query: compute incremental stats bi_full partition
>> (param1=0,day='2017-10-04',hour=00,host_platform='EU')
>>
>> WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to
>> Hive Metastore:
>>
>> CAUSED BY: InvalidOperationException: alter is not possible
>>
>>
>>
>> Looking at impalad.INFO and catalogd.INFO I don’t see any additional
>> details. I verified that I’m the owner of the tables in HDFS.
>>
>>
>>
>> Has anyone run into this issue in the past? Any workarounds?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> -- Piyush
>>
>>
>>
>
>

Re: Issues running compute incremental stats in Impala - alter not possible

Posted by Alexander Behm <al...@cloudera.com>.
Thanks for the report. I have not seen this issue. Looks like the alter RPC
is rejected by the Hive Metastore. Maybe looking into the Hive/Metastore
logs would help.

The SHOW CREATE TABLE output might also help us debug.

On Wed, Jan 3, 2018 at 10:28 AM, Piyush Narang <p....@criteo.com> wrote:

> Hi folks,
>
>
>
> I’m running into some issues when I try to compute incremental stats in
> Impala that I was hoping someone would be able to help with. I’m able to
> ‘compute stats’ in Impala on my smaller tables just fine. When I try
> computing stats incrementally for one of my larger tables, I seem to be
> running into this error:
>
> > compute incremental stats bi_ full partition (param1=0,day='2017-10-04',
> hour=00,host_platform='EU');
>
> Query: compute incremental stats bi_full partition
> (param1=0,day='2017-10-04',hour=00,host_platform='EU')
>
> WARNINGS: ImpalaRuntimeException: Error making 'alter_partitions' RPC to
> Hive Metastore:
>
> CAUSED BY: InvalidOperationException: alter is not possible
>
>
>
> Looking at impalad.INFO and catalogd.INFO I don’t see any additional
> details. I verified that I’m the owner of the tables in HDFS.
>
>
>
> Has anyone run into this issue in the past? Any workarounds?
>
>
>
> Thanks,
>
>
>
> -- Piyush
>
>
>