You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2018/05/21 11:52:19 UTC
unable to run drop stats on a partitioned tables with INT fields
Hi Community,
I have an external impala table that partitioned by year, month and day,
these field definitions are INT.
The files stored in the hdfs like:
hdfs://table_name/year=2018/month=04/day=26
Query: show partitions table_name
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| year | month | day | #Rows | #Files | Size | Bytes Cached | Cache
Replication | Format | Incremental stats | Location
|
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| 2017 | 10 | 8 | -1 | 0 | 0B | NOT CACHED | NOT
CACHED | PARQUET | false |
hdfs://table_name/year=2017/month=10/day=08 |
| 2017 | 10 | 9 | -1 | 0 | 0B | NOT CACHED | NOT
CACHED | PARQUET | false |
hdfs://table_name/year=2017/month=10/day=09
|
As you can see that the partitions mismatch with the files in the hdfs, the
partition looks like 2017 10 8 while the HDFS file
hdfs://table_name/year=2017/month=10/day=08.
I want to move to COMPUTE INCREMENTAL STATS so when i run the drop stat
statement i'm getting the following error:
RROR: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive
Metastore:
CAUSED BY: InvalidOperationException: Alter partition operation failed:
NoSuchObjectException(message:partition values=[2017, 10, 8])
I read some blog that suggested either to alter the table partitions into
string instead int or recreate the tables without the leading ZERO.
Since i have around 150 tables and i have 100's of ETLs and impala queries
that using these tables, i'm looking forward to search if there is another
solution for this issue.
--
Take Care
Fawze Abujaber
Re: unable to run drop stats on a partitioned tables with INT fields
Posted by Fawze Abujaber <fa...@gmail.com>.
Thanks Jom for your response.
I was intersting to have compute stats on these tables, i already has a
script that performing alter table recover partitions on these tables.
On Tue, May 22, 2018 at 1:39 AM, Jim Apple <jb...@cloudera.com> wrote:
> This is the ticket:
>
> https://issues.apache.org/jira/browse/IMPALA-3976
>
> Looks like that does not address workarounds in much detail, though
> INVALIDATE METADATA is suggested. That's expensive, as you probably
> know.
>
> On Mon, May 21, 2018 at 4:52 AM, Fawze Abujaber <fa...@gmail.com> wrote:
> > Hi Community,
> >
> > I have an external impala table that partitioned by year, month and day,
> > these field definitions are INT.
> >
> > The files stored in the hdfs like:
> >
> > hdfs://table_name/year=2018/month=04/day=26
> >
> > Query: show partitions table_name
> > +-------+-------+-----+-------+--------+---------+----------
> ----+-------------------+---------+-------------------+-----
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------+
> > | year | month | day | #Rows | #Files | Size | Bytes Cached | Cache
> > Replication | Format | Incremental stats | Location
> > |
> > +-------+-------+-----+-------+--------+---------+----------
> ----+-------------------+---------+-------------------+-----
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------+
> > | 2017 | 10 | 8 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED
> > | PARQUET | false | hdfs://table_name/year=2017/
> month=10/day=08
> > |
> > | 2017 | 10 | 9 | -1 | 0 | 0B | NOT CACHED | NOT
> CACHED
> > | PARQUET | false | hdfs://table_name/year=2017/
> month=10/day=09
> > |
> >
> >
> > As you can see that the partitions mismatch with the files in the hdfs,
> the
> > partition looks like 2017 10 8 while the HDFS file
> > hdfs://table_name/year=2017/month=10/day=08.
> >
> > I want to move to COMPUTE INCREMENTAL STATS so when i run the drop stat
> > statement i'm getting the following error:
> >
> > RROR: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive
> > Metastore:
> > CAUSED BY: InvalidOperationException: Alter partition operation failed:
> > NoSuchObjectException(message:partition values=[2017, 10, 8])
> >
> >
> > I read some blog that suggested either to alter the table partitions into
> > string instead int or recreate the tables without the leading ZERO.
> >
> >
> > Since i have around 150 tables and i have 100's of ETLs and impala
> queries
> > that using these tables, i'm looking forward to search if there is
> another
> > solution for this issue.
> >
> >
> > --
> > Take Care
> > Fawze Abujaber
>
--
Take Care
Fawze Abujaber
Re: unable to run drop stats on a partitioned tables with INT fields
Posted by Jim Apple <jb...@cloudera.com>.
This is the ticket:
https://issues.apache.org/jira/browse/IMPALA-3976
Looks like that does not address workarounds in much detail, though
INVALIDATE METADATA is suggested. That's expensive, as you probably
know.
On Mon, May 21, 2018 at 4:52 AM, Fawze Abujaber <fa...@gmail.com> wrote:
> Hi Community,
>
> I have an external impala table that partitioned by year, month and day,
> these field definitions are INT.
>
> The files stored in the hdfs like:
>
> hdfs://table_name/year=2018/month=04/day=26
>
> Query: show partitions table_name
> +-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
> | year | month | day | #Rows | #Files | Size | Bytes Cached | Cache
> Replication | Format | Incremental stats | Location
> |
> +-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
> | 2017 | 10 | 8 | -1 | 0 | 0B | NOT CACHED | NOT CACHED
> | PARQUET | false | hdfs://table_name/year=2017/month=10/day=08
> |
> | 2017 | 10 | 9 | -1 | 0 | 0B | NOT CACHED | NOT CACHED
> | PARQUET | false | hdfs://table_name/year=2017/month=10/day=09
> |
>
>
> As you can see that the partitions mismatch with the files in the hdfs, the
> partition looks like 2017 10 8 while the HDFS file
> hdfs://table_name/year=2017/month=10/day=08.
>
> I want to move to COMPUTE INCREMENTAL STATS so when i run the drop stat
> statement i'm getting the following error:
>
> RROR: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive
> Metastore:
> CAUSED BY: InvalidOperationException: Alter partition operation failed:
> NoSuchObjectException(message:partition values=[2017, 10, 8])
>
>
> I read some blog that suggested either to alter the table partitions into
> string instead int or recreate the tables without the leading ZERO.
>
>
> Since i have around 150 tables and i have 100's of ETLs and impala queries
> that using these tables, i'm looking forward to search if there is another
> solution for this issue.
>
>
> --
> Take Care
> Fawze Abujaber