You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Qifan Chen (Jira)" <ji...@apache.org> on 2021/03/15 14:17:00 UTC

[jira] [Updated] (HIVE-24885) The state of unset low or high value in LongColumnStatsData can not be retrieved

     [ https://issues.apache.org/jira/browse/HIVE-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qifan Chen updated HIVE-24885:
------------------------------
    Description: 
During the work to improve Impala column stats to compute min/max for columns, it is found that the state of unset low or high value in LongColumnStatsData can not be retrieved back. This is illustrated in the following Impala test case added to MetastoreEventsProcessorTest. 

  /**                                                                                          
   * Unset the low and the high value first and then check.                                    
   */                                                                                          
  @Test                                                                                        
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {                   
    try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {                           
      List<String> colNames = new ArrayList<String>();                                         
      colNames.add("id");                                                                      
      colNames.add("int_col");                                                                 
      colNames.add("bigint_col");                                                              
      List<ColumnStatisticsObj> colStatsObjs =                                             
          msClient.getHiveClient().getTableColumnStatistics(                                   
              "unique_database", "alltypes", colNames, "impala");                              
      for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
        ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
        LongColumnStatsData longColStatsData = colStatsData.getLongStats();                    
        longColStatsData.unsetLowValue();                                                      
        longColStatsData.unsetHighValue();                                                     
        colStatsData.setLongStats(longColStatsData);                                           
      }                                                                                        
      assertTrue("All good!", true);                                                           
      colStatsObjs = msClient.getHiveClient().getTableColumnStatistics(                        
          "unique_database", "alltypes", colNames, "impala");                                  
      for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
        ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
        LongColumnStatsData longColStatsData = colStatsData.getLongStats();                
        assertFalse("isSetLowValue() should be false", longColStatsData.isSetLowValue());  
        assertFalse(                                                                           
            "isSetHighValue() should be false", longColStatsData.isSetHighValue());            
      }                                                                                        
      assertTrue("All good!", true);                                                           
    } catch (NoSuchObjectException e) {                                                        
      assertFalse(String.format("No such object exception: %s", e), false);                    
    } catch (MetaException e) {                                                                
      assertFalse(String.format("Metadata exception: %s", e), false);                          
    } catch (TException e) {                                                                   
      assertFalse(String.format("TException: %s", e), false);                                  
    }                                                                                          
  } 

The assertion on isSetLowValue() or isSetHighValue() should be false, since longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff -Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue


Table unique_database.alltypes is defined as follows.

 CREATE EXTERNAL TABLE unique_database.alltypes (                                             
   id INT,                                                                                    
   bool_col BOOLEAN,                                                                          
   tinyint_col TINYINT,                                                                       
   smallint_col SMALLINT,                                                                     
   int_col INT,                                                                               
   bigint_col BIGINT,                                                                         
   float_col FLOAT,                                                                           
   double_col DOUBLE,                                                                         
   date_string_col STRING,                                                                    
   string_col STRING,                                                                         
   timestamp_col TIMESTAMP,                                                                   
   year INT                                                                                   
 )                                                                                            
 PARTITIONED BY (                                                                             
   month INT                                                                                  
 )                                                                                            
 STORED AS PARQUET                                                                            
 LOCATION 'hdfs://localhost:20500/test-warehouse/unique_database.db/alltypes'                 
 TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'STATS_GENERATED'='TASK', 'external.table.purge'='TRUE', 'impala.lastComputeStatsTime'='1615492819', 'numRows'='0', 'totalSize'='0')  

It can be built via the following in an Impala environment.

create database if not exists unique_database;                                             
use unique_database;                                             
drop table if exists alltypes;                               
CREATE TABLE alltypes
partitioned by (month)
STORED AS PARQUET
as select * from functional_parquet.alltypes 
;

  was:
During the work to improve Impala column stats to compute min/max for columns, it is found that the state of unset low or high value in LongColumnStatsData can not be retrieved back. This is illustrated in the following Impala test case added to MetastoreEventsProcessorTest. 

  /**                                                                                          
   * Unset the low and the high value first and then check.                                    
   */                                                                                          
  @Test                                                                                        
  public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {                   
    try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {                           
      List<String> colNames = new ArrayList<String>();                                         
      colNames.add("id");                                                                      
      colNames.add("int_col");                                                                 
      colNames.add("bigint_col");                                                              
      List<ColumnStatisticsObj> colStatsObjs =                                             
          msClient.getHiveClient().getTableColumnStatistics(                                   
              "unique_database", "alltypes", colNames, "impala");                              
      for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
        ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
        LongColumnStatsData longColStatsData = colStatsData.getLongStats();                    
        longColStatsData.unsetLowValue();                                                      
        longColStatsData.unsetHighValue();                                                     
        colStatsData.setLongStats(longColStatsData);                                           
      }                                                                                        
      assertTrue("All good!", true);                                                           
      colStatsObjs = msClient.getHiveClient().getTableColumnStatistics(                        
          "unique_database", "alltypes", colNames, "impala");                                  
      for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
        ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
        LongColumnStatsData longColStatsData = colStatsData.getLongStats();                
        assertFalse("isSetLowValue() should be false", longColStatsData.isSetLowValue());  
        assertFalse(                                                                           
            "isSetHighValue() should be false", longColStatsData.isSetHighValue());            
      }                                                                                        
      assertTrue("All good!", true);                                                           
    } catch (NoSuchObjectException e) {                                                        
      assertFalse(String.format("No such object exception: %s", e), false);                    
    } catch (MetaException e) {                                                                
      assertFalse(String.format("Metadata exception: %s", e), false);                          
    } catch (TException e) {                                                                   
      assertFalse(String.format("TException: %s", e), false);                                  
    }                                                                                          
  } 

The assertion on isSetLowValue() or isSetHighValue() should be false, since longColStatsData.unsetLowValue() is called in the first loop.

To build the test, 

mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff -Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue


Table unique_database.alltypes is defined as follows with several rows. 

Query: show create table  unique_database.alltypes
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| result                                                                                                                                                                                                                  |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CREATE EXTERNAL TABLE unique_database.alltypes (                                                                                                                                                                        |
|   id INT,                                                                                                                                                                                                               |
|   bool_col BOOLEAN,                                                                                                                                                                                                     |
|   tinyint_col TINYINT,                                                                                                                                                                                                  |
|   smallint_col SMALLINT,                                                                                                                                                                                                |
|   int_col INT,                                                                                                                                                                                                          |
|   bigint_col BIGINT,                                                                                                                                                                                                    |
|   float_col FLOAT,                                                                                                                                                                                                      |
|   double_col DOUBLE,                                                                                                                                                                                                    |
|   date_string_col STRING,                                                                                                                                                                                               |
|   string_col STRING,                                                                                                                                                                                                    |
|   timestamp_col TIMESTAMP,                                                                                                                                                                                              |
|   year INT                                                                                                                                                                                                              |
| )                                                                                                                                                                                                                       |
| PARTITIONED BY (                                                                                                                                                                                                        |
|   month INT                                                                                                                                                                                                             |
| )                                                                                                                                                                                                                       |
| STORED AS PARQUET                                                                                                                                                                                                       |
| LOCATION 'hdfs://localhost:20500/test-warehouse/unique_database.db/alltypes'                                                                                                                                            |
| TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'STATS_GENERATED'='TASK', 'external.table.purge'='TRUE', 'impala.lastComputeStatsTime'='1615492819', 'numRows'='0', 'totalSize'='0') |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


It can be built via the following in an Impala environment.

create database if not exists unique_database;                                             
use unique_database;                                             
drop table if exists alltypes;                               
CREATE TABLE alltypes
partitioned by (month)
STORED AS PARQUET
as select * from functional_parquet.alltypes 
;

    Environment:     (was: // Some comments here
public String getFoo()
{
    return foo;
})

> The state of unset low or high value in LongColumnStatsData can not be retrieved
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-24885
>                 URL: https://issues.apache.org/jira/browse/HIVE-24885
>             Project: Hive
>          Issue Type: Improvement
>          Components: API
>            Reporter: Qifan Chen
>            Priority: Major
>
> During the work to improve Impala column stats to compute min/max for columns, it is found that the state of unset low or high value in LongColumnStatsData can not be retrieved back. This is illustrated in the following Impala test case added to MetastoreEventsProcessorTest. 
>   /**                                                                                          
>    * Unset the low and the high value first and then check.                                    
>    */                                                                                          
>   @Test                                                                                        
>   public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {                   
>     try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {                           
>       List<String> colNames = new ArrayList<String>();                                         
>       colNames.add("id");                                                                      
>       colNames.add("int_col");                                                                 
>       colNames.add("bigint_col");                                                              
>       List<ColumnStatisticsObj> colStatsObjs =                                             
>           msClient.getHiveClient().getTableColumnStatistics(                                   
>               "unique_database", "alltypes", colNames, "impala");                              
>       for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
>         ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
>         LongColumnStatsData longColStatsData = colStatsData.getLongStats();                    
>         longColStatsData.unsetLowValue();                                                      
>         longColStatsData.unsetHighValue();                                                     
>         colStatsData.setLongStats(longColStatsData);                                           
>       }                                                                                        
>       assertTrue("All good!", true);                                                           
>       colStatsObjs = msClient.getHiveClient().getTableColumnStatistics(                        
>           "unique_database", "alltypes", colNames, "impala");                                  
>       for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
>         ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
>         LongColumnStatsData longColStatsData = colStatsData.getLongStats();                
>         assertFalse("isSetLowValue() should be false", longColStatsData.isSetLowValue());  
>         assertFalse(                                                                           
>             "isSetHighValue() should be false", longColStatsData.isSetHighValue());            
>       }                                                                                        
>       assertTrue("All good!", true);                                                           
>     } catch (NoSuchObjectException e) {                                                        
>       assertFalse(String.format("No such object exception: %s", e), false);                    
>     } catch (MetaException e) {                                                                
>       assertFalse(String.format("Metadata exception: %s", e), false);                          
>     } catch (TException e) {                                                                   
>       assertFalse(String.format("TException: %s", e), false);                                  
>     }                                                                                          
>   } 
> The assertion on isSetLowValue() or isSetHighValue() should be false, since longColStatsData.unsetLowValue() is called in the first loop.
> To build the test, 
> mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff -Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue
> Table unique_database.alltypes is defined as follows.
>  CREATE EXTERNAL TABLE unique_database.alltypes (                                             
>    id INT,                                                                                    
>    bool_col BOOLEAN,                                                                          
>    tinyint_col TINYINT,                                                                       
>    smallint_col SMALLINT,                                                                     
>    int_col INT,                                                                               
>    bigint_col BIGINT,                                                                         
>    float_col FLOAT,                                                                           
>    double_col DOUBLE,                                                                         
>    date_string_col STRING,                                                                    
>    string_col STRING,                                                                         
>    timestamp_col TIMESTAMP,                                                                   
>    year INT                                                                                   
>  )                                                                                            
>  PARTITIONED BY (                                                                             
>    month INT                                                                                  
>  )                                                                                            
>  STORED AS PARQUET                                                                            
>  LOCATION 'hdfs://localhost:20500/test-warehouse/unique_database.db/alltypes'                 
>  TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'STATS_GENERATED'='TASK', 'external.table.purge'='TRUE', 'impala.lastComputeStatsTime'='1615492819', 'numRows'='0', 'totalSize'='0')  
> It can be built via the following in an Impala environment.
> create database if not exists unique_database;                                             
> use unique_database;                                             
> drop table if exists alltypes;                               
> CREATE TABLE alltypes
> partitioned by (month)
> STORED AS PARQUET
> as select * from functional_parquet.alltypes 
> ;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)