You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/11/23 18:31:00 UTC

[jira] [Work logged] (HIVE-24397) Add the projection specification to the table request object and add placeholders in ObjectStore.java

     [ https://issues.apache.org/jira/browse/HIVE-24397?focusedWorklogId=515709&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-515709 ]

ASF GitHub Bot logged work on HIVE-24397:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Nov/20 18:30
            Start Date: 23/Nov/20 18:30
    Worklog Time Spent: 10m 
      Work Description: vihangk1 commented on a change in pull request #1681:
URL: https://github.com/apache/hive/pull/1681#discussion_r528908297



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -1855,43 +1869,81 @@ private MTable getMTable(String catName, String db, String table) {
       for (String t : tbl_names) {
         lowered_tbl_names.add(normalizeIdentifier(t));
       }
+
       query = pm.newQuery(MTable.class);
       query.setFilter("database.name == db && database.catalogName == cat && tbl_names.contains(tableName)");
       query.declareParameters("java.lang.String db, java.lang.String cat, java.util.Collection tbl_names");
-      Collection mtables = (Collection) query.execute(db, catName, lowered_tbl_names);
-      if (mtables == null || mtables.isEmpty()) {
-        // Need to differentiate between an unmatched pattern and a non-existent database
-        dbExistsQuery = pm.newQuery(MDatabase.class, "name == db && catalogName == cat");
-        dbExistsQuery.declareParameters("java.lang.String db, java.lang.String cat");
-        dbExistsQuery.setUnique(true);
-        dbExistsQuery.setResult("name");
-        String dbNameIfExists = (String) dbExistsQuery.execute(db, catName);
-        if (org.apache.commons.lang3.StringUtils.isEmpty(dbNameIfExists)) {
-          throw new UnknownDBException("Could not find database " +
-              DatabaseName.getQualified(catName, db));
+
+      if (projectionSpec == null) {
+        mtables = (List<MTable>) query.execute(db, catName, lowered_tbl_names);
+      }
+      else if(projectionSpec.getFieldList() != null && projectionSpec.getFieldList().size() > 1) {

Review comment:
       why do we need to code blockes for projectionSpec fieldList size > 1 and == 1? the code looks very similar.

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
##########
@@ -733,6 +733,27 @@ Table getTable(String catName, String dbName, String tableName,
   List<Table> getTableObjectsByName(String dbName, List<String> tableNames)
       throws MetaException, InvalidOperationException, UnknownDBException, TException;
 
+
+  /**
+   * Get tables as objects (rather than just fetching their names).  This is more expensive and
+   * should only be used if you actually need all the information about the tables.
+   * @param request GetTablesRequest Object.
+   * @return A list of objects representing the tables.
+   *          Only the tables that can be retrieved from the database are returned.  For example,
+   *          if none of the requested tables could be retrieved, an empty list is returned.
+   *          There is no guarantee of ordering of the returned tables.
+   * @throws InvalidOperationException
+   *          The input to this operation is invalid (e.g., the list of tables names is null)
+   * @throws UnknownDBException
+   *          The requested database could not be fetched.
+   * @throws TException
+   *          A thrift communication error occurred
+   * @throws MetaException
+   *          Any other errors
+   */
+  List<Table> getTableObjectsByRequest(GetTablesRequest request)

Review comment:
       Does the method name have to include byRequest? Its self-evident by the signature right? How about getTables(GetTablesRequest request). Also, can we keep the return type as GetTablesResponse here so that the interface is more extendable in the future?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
##########
@@ -1855,43 +1869,81 @@ private MTable getMTable(String catName, String db, String table) {
       for (String t : tbl_names) {
         lowered_tbl_names.add(normalizeIdentifier(t));
       }
+
       query = pm.newQuery(MTable.class);
       query.setFilter("database.name == db && database.catalogName == cat && tbl_names.contains(tableName)");
       query.declareParameters("java.lang.String db, java.lang.String cat, java.util.Collection tbl_names");
-      Collection mtables = (Collection) query.execute(db, catName, lowered_tbl_names);
-      if (mtables == null || mtables.isEmpty()) {
-        // Need to differentiate between an unmatched pattern and a non-existent database
-        dbExistsQuery = pm.newQuery(MDatabase.class, "name == db && catalogName == cat");
-        dbExistsQuery.declareParameters("java.lang.String db, java.lang.String cat");
-        dbExistsQuery.setUnique(true);
-        dbExistsQuery.setResult("name");
-        String dbNameIfExists = (String) dbExistsQuery.execute(db, catName);
-        if (org.apache.commons.lang3.StringUtils.isEmpty(dbNameIfExists)) {
-          throw new UnknownDBException("Could not find database " +
-              DatabaseName.getQualified(catName, db));
+
+      if (projectionSpec == null) {
+        mtables = (List<MTable>) query.execute(db, catName, lowered_tbl_names);
+      }
+      else if(projectionSpec.getFieldList() != null && projectionSpec.getFieldList().size() > 1) {
+        // fetch partially filled tables using result clause
+        query.setResult(Joiner.on(',').join(projectionSpec.getFieldList()));

Review comment:
       Can we add a validation method before actually executing the query? That way you don't hit the database query if the projectionSpec has invalid fields and throw an error early. The performance is better and code becomes cleaner since you avoid the logic at 1924-1915 if you validate the fields first.

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesGetExists.java
##########
@@ -402,6 +393,134 @@ public void testGetTableObjectsByName() throws Exception {
 
   }
 
+  @Test
+  public void testGetTableObjectsWithProjectionOfSingleField() throws Exception {
+    List<String> tableNames = new ArrayList<>();
+    tableNames.add(testTables[0].getTableName());
+    tableNames.add(testTables[1].getTableName());
+
+    GetTablesRequest request = new GetTablesRequest();
+    request.setProjectionSpec(new GetProjectionsSpec());
+    request.setTblNames(tableNames);
+    request.setDbName(DEFAULT_DATABASE);
+
+    GetProjectionsSpec projectSpec = request.getProjectionSpec();
+    List<String> projectedFields = Collections.singletonList("sd.location");
+    projectSpec.setFieldList(projectedFields);
+
+    List<Table> tables = client.getTableObjectsByRequest(request);
+
+    Assert.assertEquals("Found tables", 2, tables.size());
+
+    for(Table table : tables) {
+      Assert.assertFalse(table.isSetDbName());
+      Assert.assertFalse(table.isSetCatName());
+      Assert.assertFalse(table.isSetTableName());
+      Assert.assertTrue(table.isSetSd());
+    }
+  }
+
+  @Test
+  public void testGetTableObjectsWithNullProjectionSpec() throws Exception {
+    List<String> tableNames = new ArrayList<>();
+    tableNames.add(testTables[0].getTableName());
+    tableNames.add(testTables[1].getTableName());
+
+    GetTablesRequest request = new GetTablesRequest();
+    request.setProjectionSpec(null);
+    request.setTblNames(tableNames);
+    request.setDbName(DEFAULT_DATABASE);
+
+    List<Table> tables = client.getTableObjectsByRequest(request);
+
+    Assert.assertEquals("Found tables", 2, tables.size());
+  }
+
+  @Test
+  public void testGetTableObjectsWithNonExistentColumn() throws Exception {
+    List<String> tableNames = new ArrayList<>();
+    tableNames.add(testTables[0].getTableName());
+    tableNames.add(testTables[1].getTableName());
+
+    GetTablesRequest request = new GetTablesRequest();
+    request.setProjectionSpec(new GetProjectionsSpec());
+    request.setTblNames(tableNames);
+    request.setDbName(DEFAULT_DATABASE);
+
+    GetProjectionsSpec projectSpec = request.getProjectionSpec();
+    List<String> projectedFields = Arrays.asList("Invalid1");
+    projectSpec.setFieldList(projectedFields);
+
+    Assert.assertThrows(Exception.class, ()->client.getTableObjectsByRequest(request));
+  }
+
+
+  @Test
+  public void testGetTableObjectsWithNonExistentColumns() throws Exception {
+    List<String> tableNames = new ArrayList<>();
+    tableNames.add(testTables[0].getTableName());
+    tableNames.add(testTables[1].getTableName());
+
+    GetTablesRequest request = new GetTablesRequest();
+    request.setProjectionSpec(new GetProjectionsSpec());
+    request.setTblNames(tableNames);
+    request.setDbName(DEFAULT_DATABASE);
+
+    GetProjectionsSpec projectSpec = request.getProjectionSpec();
+    List<String> projectedFields = Arrays.asList("Invalid1", "Invalid2");
+    projectSpec.setFieldList(projectedFields);
+
+    Assert.assertThrows(Exception.class, ()->client.getTableObjectsByRequest(request));
+  }
+
+  @Test
+  public void testGetTableObjectsWithEmptyProjection() throws Exception {
+    List<String> tableNames = new ArrayList<>();
+    tableNames.add(testTables[0].getTableName());
+    tableNames.add(testTables[1].getTableName());
+
+    GetTablesRequest request = new GetTablesRequest();
+    request.setProjectionSpec(new GetProjectionsSpec());
+    request.setTblNames(tableNames);
+    request.setDbName(DEFAULT_DATABASE);
+
+    GetProjectionsSpec projectSpec = request.getProjectionSpec();
+    List<String> projectedFields = Arrays.asList();
+    projectSpec.setFieldList(projectedFields);
+
+    List<Table> tables = client.getTableObjectsByRequest(request);
+
+    Assert.assertEquals("Found tables", 0, tables.size());
+  }
+
+  @Test
+  public void testGetTableObjectsWithProjectionOfMultipleField() throws Exception {
+    List<String> tableNames = new ArrayList<>();
+    tableNames.add(testTables[0].getTableName());
+    tableNames.add(testTables[1].getTableName());
+
+    GetTablesRequest request = new GetTablesRequest();
+    request.setProjectionSpec(new GetProjectionsSpec());
+    request.setTblNames(tableNames);
+    request.setDbName(DEFAULT_DATABASE);
+
+    GetProjectionsSpec projectSpec = request.getProjectionSpec();
+    List<String> projectedFields = Arrays.asList("database", "tableName", "createTime", "lastAccessTime");

Review comment:
       can we add some nested single valued fields here too. Something other than sd.location? like for example. sd.serdeInfo.serializationLib? Also, what is the expected behavior if the user adds a multi-valued field like sd.parameters. Can we have some tests around that too.

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
##########
@@ -2360,6 +2360,20 @@ public Table getTable(String catName, String dbName, String tableName, String va
     return deepCopyTables(FilterUtils.filterTablesIfEnabled(isClientFilterEnabled, filterHook, tabs));
   }
 
+  @Override
+  public List<Table> getTableObjectsByRequest(GetTablesRequest req) throws TException {

Review comment:
       Do you need to do any special handling of SessionMetastoreClient?

##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1454,7 +1454,8 @@ struct GetTablesRequest {
   3: optional ClientCapabilities capabilities,
   4: optional string catName,
   5: optional list<string> processorCapabilities,
-  6: optional string processorIdentifier
+  6: optional string processorIdentifier,
+  7: optional GetProjectionsSpec projectionSpec

Review comment:
       can you add a comment here saying what is support and what is not? For example, ProjectionSpec has fields to filter the parameters too. But we don't support it here. The code should also validate such cases and throw a MetaException with appropriate error message.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 515709)
    Remaining Estimate: 0h
            Time Spent: 10m

> Add the projection specification to the table request object and add placeholders in ObjectStore.java
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-24397
>                 URL: https://issues.apache.org/jira/browse/HIVE-24397
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Narayanan Venkateswaran
>            Assignee: Narayanan Venkateswaran
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)