You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "bharath v (JIRA)" <ji...@apache.org> on 2018/06/29 22:12:00 UTC

[jira] [Comment Edited] (IMPALA-7225) Refresh on single partition resets partition's row count to -1

    [ https://issues.apache.org/jira/browse/IMPALA-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16528311#comment-16528311 ] 

bharath v edited comment on IMPALA-7225 at 6/29/18 10:11 PM:
-------------------------------------------------------------

I guess the problem is somewhere below. But, what if the data changes? Say 90% of the files are removed. Is setting to -1 acceptable in such cases?

 
{noformat}
public void reloadPartition(HdfsPartition oldPartition, Partition hmsPartition)
throws CatalogException {
HdfsPartition refreshedPartition = createPartition(
hmsPartition.getSd(), hmsPartition);
refreshPartitionFileMetadata(refreshedPartition);

>>>>>>>>>>>>>>>>>>missing>>>>>>>>>>>>>>>>>> 
// If data is unchanged.
if (hmsPartition.getParameters() != null) { refreshedPartition.setNumRows(FeCatalogUtils.getRowCount(hmsPartition.getParameters())); } 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Preconditions.checkArgument(oldPartition == null
|| HdfsPartition.KV_COMPARATOR.compare(oldPartition, refreshedPartition) == 0);
dropPartition(oldPartition);
addPartition(refreshedPartition);{noformat}
 


was (Author: bharathv):
I guess the problem is somewhere below. But, what if the data changes? Say 90% of the files are removed. What should we do in such cases. Is setting to -1 acceptable in such cases?

 
{noformat}
public void reloadPartition(HdfsPartition oldPartition, Partition hmsPartition)
throws CatalogException {
HdfsPartition refreshedPartition = createPartition(
hmsPartition.getSd(), hmsPartition);
refreshPartitionFileMetadata(refreshedPartition);

>>>>>>>>>>>>>>>>>>missing>>>>>>>>>>>>>>>>>> 
// If data is unchanged.
if (hmsPartition.getParameters() != null) { refreshedPartition.setNumRows(FeCatalogUtils.getRowCount(hmsPartition.getParameters())); } 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Preconditions.checkArgument(oldPartition == null
|| HdfsPartition.KV_COMPARATOR.compare(oldPartition, refreshedPartition) == 0);
dropPartition(oldPartition);
addPartition(refreshedPartition);{noformat}
 

> Refresh on single partition resets partition's row count to -1
> --------------------------------------------------------------
>
>                 Key: IMPALA-7225
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7225
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>    Affects Versions: Impala 2.10.0, Impala 2.12.0
>            Reporter: Mala Chikka Kempanna
>            Priority: Major
>
> Doing refresh on single partition resets it's row count to -1
>  
> {code:java}
> [host-2.x.y.z:21000] > show partitions web_logs_new;
> Query: show partitions web_logs_new
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> | date_col | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> | 2015-11-18 | -1 | 1 | 112.15KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-18 |
> | 2015-11-19 | -1 | 1 | 98.83KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-19 |
> | 2015-11-20 | -1 | 1 | 101.57KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-20 |
> | 2015-11-21 | -1 | 1 | 82.99KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-21 |
> | Total | -1 | 4 | 395.54KB | 0B | | | | |
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> Fetched 5 row(s) in 0.01s
> [host-2.x.y.z:21000] > compute stats web_logs_new;
> Query: compute stats web_logs_new
> +------------------------------------------+
> | summary |
> +------------------------------------------+
> | Updated 4 partition(s) and 28 column(s). |
> +------------------------------------------+
> Fetched 1 row(s) in 1.31s
> [nightly513-unsecure-2.gce.cloudera.com:21000] > show partitions web_logs_new;
> Query: show partitions web_logs_new
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> | date_col | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> | 2015-11-18 | 250 | 1 | 112.15KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-18 |
> | 2015-11-19 | 250 | 1 | 98.83KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-19 |
> | 2015-11-20 | 250 | 1 | 101.57KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-20 |
> | 2015-11-21 | 250 | 1 | 82.99KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-21 |
> | Total | 1000 | 4 | 395.54KB | 0B | | | | |
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> Fetched 5 row(s) in 0.01s
> [host-2.x.y.z:21000] > refresh web_logs_new partition(date_col='2015-11-18');
> Query: refresh web_logs_new partition(date_col='2015-11-18')
> Query submitted at: 2018-06-29 12:53:32 (Coordinator: http://nightly513-unsecure-2.gce.cloudera.com:25000)
> Query progress can be monitored at: http://nightly513-unsecure-2.gce.cloudera.com:25000/query_plan?query_id=7146dedb62cb6503:bc403a8500000000
> Fetched 0 row(s) in 0.06s
> [host-2.x.y.z:21000] > show partitions web_logs_new;
> Query: show partitions web_logs_new
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> | date_col | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> | 2015-11-18 | -1 | 1 | 112.15KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-18 |
> | 2015-11-19 | 250 | 1 | 98.83KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-19 |
> | 2015-11-20 | 250 | 1 | 101.57KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-20 |
> | 2015-11-21 | 250 | 1 | 82.99KB | NOT CACHED | NOT CACHED | TEXT | false | hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-21 |
> | Total | 1000 | 4 | 395.54KB | 0B | | | | |
> +------------+-------+--------+----------+--------------+-------------------+--------+-------------------+---------------------------------------------------------------------------------------------------------+
> Fetched 5 row(s) in 0.01s
>  
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org