You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/07/29 17:53:00 UTC

[jira] [Work logged] (HIVE-23890) Create HMS endpoint for querying file lists using FlatBuffers as serialization

     [ https://issues.apache.org/jira/browse/HIVE-23890?focusedWorklogId=464240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-464240 ]

ASF GitHub Bot logged work on HIVE-23890:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/Jul/20 17:52
            Start Date: 29/Jul/20 17:52
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #1330:
URL: https://github.com/apache/hive/pull/1330#discussion_r462178379



##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
##########
@@ -191,6 +191,20 @@ void dropCatalog(String catName)
   List<String> getTables(String dbName, String tablePattern)
       throws MetaException, TException, UnknownDBException;
 
+
+  /**
+   * Get the files from the filesystem that the specified table/partition contains

Review comment:
       nit: javadoc complains if there is no '.' at the end of the first sentence :)

##########
File path: standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/fb/FbFileStatus.java
##########
@@ -0,0 +1,51 @@
+// automatically generated by the FlatBuffers compiler, do not modify

Review comment:
       Rat check did not complain?

##########
File path: standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
##########
@@ -1861,6 +1861,18 @@ struct ScheduledQueryProgressInfo{
   4: optional string errorMessage,
 }
 
+struct GetFileListRequest {
+  1: optional string catName,
+  2: optional string dbName,
+  3: optional string tableName,
+  4: optional list<string> partVals,
+  6: optional string validWriteIdList
+}
+
+struct GetFileListResponse {
+  1: required list<binary> fileListData

Review comment:
       Should be optional, so if we decide to change something then we can change easily.
   
   Maybe based on @vihangk1 's suggestion, we should add a version number here for the FB data. So if we change content of the binary later it could be done easily

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -75,11 +75,10 @@
 import com.google.common.base.Suppliers;
 import com.google.common.collect.Lists;
 
+import com.google.flatbuffers.FlatBufferBuilder;
 import org.apache.commons.cli.OptionBuilder;
 import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileStatus;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.*;

Review comment:
       nit: Please avoid '*' imports

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -5685,6 +5685,63 @@ private void alter_table_core(String catName, String dbname, String name, Table
       }
     }
 
+    @Override
+    public GetFileListResponse get_file_list(GetFileListRequest req) throws MetaException {
+      String catName = req.isSetCatName() ? req.getCatName() : getDefaultCatalog(conf);
+      String dbName = req.getDbName();
+      String tblName = req.getTableName();
+      List<String> partitions = req.getPartVals();
+      // Will be used later, when cache is introduced
+      String validWriteIdList = req.getValidWriteIdList();
+
+      startFunction("get_file_list", ": " + TableName.getQualified(catName, dbName, tblName)
+              + ", partitions: " + partitions.toString());
+
+
+      GetFileListResponse response = new GetFileListResponse();
+
+      boolean success = false;
+      Exception ex = null;
+      try {
+        Partition p =  getMS().getPartition(catName, dbName, tblName, partitions);

Review comment:
       Uhhh... this will be costly - any way to avoid it?

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetFileList.java
##########
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.fs.*;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.*;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.hive.metastore.fb.FbFileStatus;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.junit.*;

Review comment:
       nit: avoid wildcard imports please

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetFileList.java
##########
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.fs.*;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.*;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.hive.metastore.fb.FbFileStatus;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.junit.*;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import java.nio.ByteBuffer;
+import java.nio.file.Files;
+import java.util.*;

Review comment:
       nit: avoid wildcard imports please

##########
File path: standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestGetFileList.java
##########
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.metastore.client;
+
+import org.apache.hadoop.fs.*;
+import org.apache.hadoop.hive.metastore.IMetaStoreClient;
+import org.apache.hadoop.hive.metastore.annotation.MetastoreCheckinTest;
+import org.apache.hadoop.hive.metastore.api.*;
+import org.apache.hadoop.hive.metastore.client.builder.DatabaseBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.PartitionBuilder;
+import org.apache.hadoop.hive.metastore.client.builder.TableBuilder;
+import org.apache.hadoop.hive.metastore.conf.MetastoreConf.ConfVars;
+import org.apache.hadoop.hive.metastore.fb.FbFileStatus;
+import org.apache.hadoop.hive.metastore.minihms.AbstractMetaStoreService;
+import org.junit.*;
+import org.junit.experimental.categories.Category;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import java.nio.ByteBuffer;
+import java.nio.file.Files;
+import java.util.*;
+
+/**
+ * Tests for listing files.
+ */
+@RunWith(Parameterized.class)
+@Category(MetastoreCheckinTest.class)
+public class TestGetFileList extends MetaStoreClientTest {
+  private AbstractMetaStoreService metaStore;
+  private IMetaStoreClient client;
+  private FileSystem fs;
+
+  private static String hiveWarehouse;
+
+  protected static final String DB_NAME = "test_getfile_db";
+  protected static final String TABLE_NAME = "test_getfile_table";
+  protected static final String FILE_NAME = "sample";
+  protected static final String PARTITION1 = "partcol1";
+  protected static final String PARTITION2 = "partcol2";
+  protected static final int NUM_PARTITIONS = 2;
+  protected static final int NUM_FILES_IN_PARTITION  = 5;
+
+
+  public TestGetFileList(String name, AbstractMetaStoreService metaStore) {
+    this.metaStore = metaStore;
+  }
+
+  @BeforeClass
+  public static void startMetaStores() {
+    Map<ConfVars, String> msConf = new HashMap<>();
+    Map<String, String> extraConf = new HashMap<>();
+    extraConf.put(ConfVars.HIVE_IN_TEST.getVarname(), "true");
+    startMetaStores(msConf, extraConf);
+  }
+
+  @Before
+  public void setUp() throws Exception {
+    // Get new client
+    client = metaStore.getClient();
+
+    // Clean up the database
+    client.dropDatabase(DB_NAME, true, true, true);
+    metaStore.cleanWarehouseDirs();
+
+    //Create files in local filesystem
+    fs = new LocalFileSystem();
+    java.nio.file.Path tempDirWithPrefix = Files.createTempDirectory("GetFileTest_");
+    hiveWarehouse = tempDirWithPrefix.toString();
+    fs.initialize(tempDirWithPrefix.toUri(), metaStore.getConf());
+  }
+
+  @After
+  public void tearDown() throws Exception {
+    try {
+        client.close();
+    } finally {
+      client = null;
+    }
+  }
+
+  protected AbstractMetaStoreService getMetaStore() {
+    return metaStore;
+  }
+
+  protected IMetaStoreClient getClient() {
+    return client;
+  }
+
+  protected void setClient(IMetaStoreClient client) {
+    this.client = client;
+  }
+
+  @Test
+  public void testGetFileListResultCountOnePartition() throws Exception {
+    createDatabaseOnePartition();
+    List<String> partVals = new ArrayList<>();
+    partVals.add("a1");
+    GetFileListResponse resp = getClient().getFileList(null, DB_NAME, TABLE_NAME, partVals,  "");
+    Assert.assertEquals(NUM_FILES_IN_PARTITION, resp.getFileListDataSize());
+  }
+
+  @Test
+  public void testGetFileListResultOnePartition() throws Exception {
+    createDatabaseOnePartition();
+    List<String> partVals = new ArrayList<>();
+    partVals.add("a1");
+    GetFileListResponse resp = getClient().getFileList(null, DB_NAME, TABLE_NAME, partVals,  "");
+    List<String> expectedPaths = new ArrayList<>();
+    for (int i = 0; i < NUM_FILES_IN_PARTITION; i++) {
+      expectedPaths.add("/" + FILE_NAME + i);
+    }
+
+    List<String> actualPaths = new ArrayList<>();
+    for (ByteBuffer buffer: resp.getFileListData()) {
+      FbFileStatus fileStatus = FbFileStatus.getRootAsFbFileStatus(buffer);
+      actualPaths.add(fileStatus.relativePath());
+    }
+    Collections.sort(actualPaths);
+    Assert.assertArrayEquals(expectedPaths.toArray(), actualPaths.toArray());
+  }
+
+  @Test
+  public void testGetFileListResultMultiPartition() throws Exception {
+    createDatabaseMultiPartition();
+    List<String> partVals = new ArrayList<>();
+    partVals.add("a1");
+    partVals.add("1");
+    GetFileListResponse resp = getClient().getFileList(null, DB_NAME, TABLE_NAME, partVals,  "");
+    List<String> expectedPaths = new ArrayList<>();
+    for (int i = 0; i < NUM_FILES_IN_PARTITION; i++) {
+      expectedPaths.add("/" + FILE_NAME + i);
+    }
+
+    List<String> actualPaths = new ArrayList<>();
+    for (ByteBuffer buffer: resp.getFileListData()) {
+      FbFileStatus fileStatus = FbFileStatus.getRootAsFbFileStatus(buffer);
+      actualPaths.add(fileStatus.relativePath());
+    }
+    Collections.sort(actualPaths);
+    Assert.assertArrayEquals(expectedPaths.toArray(), actualPaths.toArray());
+  }
+
+  @Test
+  public void testGetFileListNoResult() throws Exception {
+    createDatabaseOnePartition();
+    List<String> partVals = new ArrayList<>();
+    partVals.add("a2");
+    Assert.assertThrows(MetaException.class,
+            () -> getClient().getFileList(null, DB_NAME, TABLE_NAME, partVals,  ""));
+  }
+
+
+
+  private void createDatabaseOnePartition() throws Exception {
+    String path = hiveWarehouse;
+
+    Database db = new DatabaseBuilder().
+            setName(DB_NAME).
+            setLocation(path).

Review comment:
       Why do you need to set the location?
   Weren't default location working for this test?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
##########
@@ -5685,6 +5685,63 @@ private void alter_table_core(String catName, String dbname, String name, Table
       }
     }
 
+    @Override
+    public GetFileListResponse get_file_list(GetFileListRequest req) throws MetaException {
+      String catName = req.isSetCatName() ? req.getCatName() : getDefaultCatalog(conf);
+      String dbName = req.getDbName();
+      String tblName = req.getTableName();
+      List<String> partitions = req.getPartVals();
+      // Will be used later, when cache is introduced
+      String validWriteIdList = req.getValidWriteIdList();
+
+      startFunction("get_file_list", ": " + TableName.getQualified(catName, dbName, tblName)
+              + ", partitions: " + partitions.toString());
+
+
+      GetFileListResponse response = new GetFileListResponse();
+
+      boolean success = false;
+      Exception ex = null;
+      try {
+        Partition p =  getMS().getPartition(catName, dbName, tblName, partitions);

Review comment:
       Will be solved by the cache




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 464240)
    Remaining Estimate: 0h
            Time Spent: 10m

> Create HMS endpoint for querying file lists using FlatBuffers as serialization
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-23890
>                 URL: https://issues.apache.org/jira/browse/HIVE-23890
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Barnabas Maidics
>            Assignee: Barnabas Maidics
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> New thrift objects would be:
> {code:java}
> struct GetFileListRequest {
>     1: optional string catName,
>     2: required string dbName,
>     3: required string tableName,
>     4: required list<string> partVals,
>     6: optional string validWriteIdList
> }
> struct GetFileListResponse {
>     1: required binary fileListData
> }
> {code}
> Where GetFileListResponse contains a binary field, which would be a FlatBuffer object



--
This message was sent by Atlassian Jira
(v8.3.4#803005)