You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@impala.apache.org by Fawze Abujaber <fa...@gmail.com> on 2018/05/21 11:52:19 UTC

unable to run drop stats on a partitioned tables with INT fields

Hi Community,

I have an external impala table that partitioned by year, month and day,
these field definitions are INT.

The files stored in the hdfs like:

hdfs://table_name/year=2018/month=04/day=26

Query: show partitions table_name
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache
Replication | Format  | Incremental stats | Location

                                 |
+-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
| 2017  | 10    | 8   | -1    | 0      | 0B      | NOT CACHED   | NOT
CACHED        | PARQUET | false             |
hdfs://table_name/year=2017/month=10/day=08 |
| 2017  | 10    | 9   | -1    | 0      | 0B      | NOT CACHED   | NOT
CACHED        | PARQUET | false             |
hdfs://table_name/year=2017/month=10/day=09
|


As you can see that the partitions mismatch with the files in the hdfs, the
partition looks like 2017 10  8 while the HDFS file
hdfs://table_name/year=2017/month=10/day=08.

I want to move to COMPUTE INCREMENTAL STATS so when i run the drop stat
statement i'm getting the following error:

RROR: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive
Metastore:
CAUSED BY: InvalidOperationException: Alter partition operation failed:
NoSuchObjectException(message:partition values=[2017, 10, 8])


I read some blog that suggested either to alter the table partitions into
string instead int or recreate the tables without the leading ZERO.


Since i have around 150 tables and i have 100's of ETLs and impala queries
that using these tables, i'm looking forward to search if there is another
solution for this issue.


-- 
Take Care
Fawze Abujaber

Re: unable to run drop stats on a partitioned tables with INT fields

Posted by Fawze Abujaber <fa...@gmail.com>.
Thanks Jom for your response.

I was intersting to have compute stats on these tables, i already has a
script that performing alter table recover partitions on these tables.

On Tue, May 22, 2018 at 1:39 AM, Jim Apple <jb...@cloudera.com> wrote:

> This is the ticket:
>
> https://issues.apache.org/jira/browse/IMPALA-3976
>
> Looks like that does not address workarounds in much detail, though
> INVALIDATE METADATA is suggested. That's expensive, as you probably
> know.
>
> On Mon, May 21, 2018 at 4:52 AM, Fawze Abujaber <fa...@gmail.com> wrote:
> > Hi Community,
> >
> > I have an external impala table that partitioned by year, month and day,
> > these field definitions are INT.
> >
> > The files stored in the hdfs like:
> >
> > hdfs://table_name/year=2018/month=04/day=26
> >
> > Query: show partitions table_name
> > +-------+-------+-----+-------+--------+---------+----------
> ----+-------------------+---------+-------------------+-----
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------+
> > | year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache
> > Replication | Format  | Incremental stats | Location
> > |
> > +-------+-------+-----+-------+--------+---------+----------
> ----+-------------------+---------+-------------------+-----
> ------------------------------------------------------------
> ------------------------------------------------------------
> -----------------+
> > | 2017  | 10    | 8   | -1    | 0      | 0B      | NOT CACHED   | NOT
> CACHED
> > | PARQUET | false             | hdfs://table_name/year=2017/
> month=10/day=08
> > |
> > | 2017  | 10    | 9   | -1    | 0      | 0B      | NOT CACHED   | NOT
> CACHED
> > | PARQUET | false             | hdfs://table_name/year=2017/
> month=10/day=09
> > |
> >
> >
> > As you can see that the partitions mismatch with the files in the hdfs,
> the
> > partition looks like 2017 10  8 while the HDFS file
> > hdfs://table_name/year=2017/month=10/day=08.
> >
> > I want to move to COMPUTE INCREMENTAL STATS so when i run the drop stat
> > statement i'm getting the following error:
> >
> > RROR: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive
> > Metastore:
> > CAUSED BY: InvalidOperationException: Alter partition operation failed:
> > NoSuchObjectException(message:partition values=[2017, 10, 8])
> >
> >
> > I read some blog that suggested either to alter the table partitions into
> > string instead int or recreate the tables without the leading ZERO.
> >
> >
> > Since i have around 150 tables and i have 100's of ETLs and impala
> queries
> > that using these tables, i'm looking forward to search if there is
> another
> > solution for this issue.
> >
> >
> > --
> > Take Care
> > Fawze Abujaber
>



-- 
Take Care
Fawze Abujaber

Re: unable to run drop stats on a partitioned tables with INT fields

Posted by Jim Apple <jb...@cloudera.com>.
This is the ticket:

https://issues.apache.org/jira/browse/IMPALA-3976

Looks like that does not address workarounds in much detail, though
INVALIDATE METADATA is suggested. That's expensive, as you probably
know.

On Mon, May 21, 2018 at 4:52 AM, Fawze Abujaber <fa...@gmail.com> wrote:
> Hi Community,
>
> I have an external impala table that partitioned by year, month and day,
> these field definitions are INT.
>
> The files stored in the hdfs like:
>
> hdfs://table_name/year=2018/month=04/day=26
>
> Query: show partitions table_name
> +-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
> | year  | month | day | #Rows | #Files | Size    | Bytes Cached | Cache
> Replication | Format  | Incremental stats | Location
> |
> +-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------------------------------------------------------------------------+
> | 2017  | 10    | 8   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED
> | PARQUET | false             | hdfs://table_name/year=2017/month=10/day=08
> |
> | 2017  | 10    | 9   | -1    | 0      | 0B      | NOT CACHED   | NOT CACHED
> | PARQUET | false             | hdfs://table_name/year=2017/month=10/day=09
> |
>
>
> As you can see that the partitions mismatch with the files in the hdfs, the
> partition looks like 2017 10  8 while the HDFS file
> hdfs://table_name/year=2017/month=10/day=08.
>
> I want to move to COMPUTE INCREMENTAL STATS so when i run the drop stat
> statement i'm getting the following error:
>
> RROR: ImpalaRuntimeException: Error making 'alter_partitions' RPC to Hive
> Metastore:
> CAUSED BY: InvalidOperationException: Alter partition operation failed:
> NoSuchObjectException(message:partition values=[2017, 10, 8])
>
>
> I read some blog that suggested either to alter the table partitions into
> string instead int or recreate the tables without the leading ZERO.
>
>
> Since i have around 150 tables and i have 100's of ETLs and impala queries
> that using these tables, i'm looking forward to search if there is another
> solution for this issue.
>
>
> --
> Take Care
> Fawze Abujaber