You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/07/14 15:33:25 UTC

[GitHub] [hive] jfsii opened a new pull request, #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

jfsii opened a new pull request, #3443:
URL: https://github.com/apache/hive/pull/3443

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/Hive/HowToContribute
     2. Ensure that you have created an issue on the Hive project JIRA: https://issues.apache.org/jira/projects/HIVE/summary
     3. Ensure you have added or run the appropriate tests for your PR: 
     4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]HIVE-XXXXX:  Your PR title ...'.
     5. Be sure to keep the PR description updated to reflect all changes.
     6. Please write your PR title to summarize what this PR proposes.
     7. If possible, provide a concise example to reproduce the issue for a faster review.
   
   -->
   
   ### What changes were proposed in this pull request? 
   Add support for schema inference from parquet files.
   CREATE TABLE like_test_all_types LIKE FILE PARQUET '${system:test.tmp.dir}/test_all_types/000000_0';
   
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed? It allows users to more easily create tables from already existing files. It is sometimes difficult or burdensome for users to manually figure out the schema.
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change? A new form of CREATE TABLE LIKE
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description, screenshot and/or a reproducable example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Hive versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested? Using the ptest framework and testing on a cluster against s3.
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921511436


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -13477,6 +13480,23 @@ private boolean hasConstraints(final List<FieldSchema> partCols, final List<SQLD
     }
     return false;
   }
+
+  boolean doesSupportSchemaInference(String fileFormat) throws SemanticException {
+    StorageFormatFactory storageFormatFactory = new StorageFormatFactory();
+    StorageFormatDescriptor descriptor = storageFormatFactory.get(fileFormat);
+    if (descriptor == null) {
+      throw new SemanticException("CREATE TABLE LIKE FILE is not supported by the '" + likeFileFormat + "' file format");

Review Comment:
   I've added them for the changes in SemanticAnalyzer and CreateTableOperation.
   Should I do the same for the Serde error messages too?
   (I also fixed the compilation error here, I guess that is what I get for a last minute change)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921882485


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -13477,6 +13480,23 @@ private boolean hasConstraints(final List<FieldSchema> partCols, final List<SQLD
     }
     return false;
   }
+
+  boolean doesSupportSchemaInference(String fileFormat) throws SemanticException {
+    StorageFormatFactory storageFormatFactory = new StorageFormatFactory();
+    StorageFormatDescriptor descriptor = storageFormatFactory.get(fileFormat);
+    if (descriptor == null) {
+      throw new SemanticException("CREATE TABLE LIKE FILE is not supported by the '" + likeFileFormat + "' file format");

Review Comment:
   It seems ErrorMsg is not used in the hive-serde project.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r922223795


##########
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java:
##########
@@ -64,16 +64,13 @@ public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context,
     // Analyze and create tbl properties object
     int numCh = ast.getChildCount();
 
-    tableName = BaseSemanticAnalyzer.getUnescapedName((ASTNode) ast

Review Comment:
   mvn test -Dtest=TestSemanticAnalysis



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz merged pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
kasakrisz merged PR #3443:
URL: https://github.com/apache/hive/pull/3443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921510384


##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -234,4 +251,161 @@ StructTypeInfo prune() {
       return (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(newNames, newTypes);
     }
   }
+
+  // ReadSchema interface implementation
+  private String convertGroupType(GroupType group) throws SerDeException {
+    boolean first = true;
+    StringBuilder sb = new StringBuilder(serdeConstants.STRUCT_TYPE_NAME + "<");
+    for (Type field: group.getFields()) {
+      if (first) {
+        first = false;
+      } else {
+        sb.append(",");
+      }
+      // fieldName:typeName
+      sb.append(field.getName() + ":" + convertParquetTypeToFieldType(field));

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] amansinha100 commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
amansinha100 commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r922878384


##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java:
##########
@@ -53,8 +59,33 @@ public CreateTableOperation(DDLOperationContext context, CreateTableDesc desc) {
     super(context, desc);
   }
 
+  private void readSchemaFromFile() throws HiveException {

Review Comment:
   nit: Add a brief comment since this is the main method for schema inference.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -13477,6 +13480,23 @@ private boolean hasConstraints(final List<FieldSchema> partCols, final List<SQLD
     }
     return false;
   }
+
+  boolean doesSupportSchemaInference(String fileFormat) throws SemanticException {

Review Comment:
   Pls add a comment.  Also, this could be a public static utility method that takes fileFormat and the conf parameters. 



##########
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java:
##########
@@ -88,10 +85,6 @@ public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context,
       case HiveParser.TOK_ALTERTABLE_BUCKETS:
         break;
 
-      case HiveParser.TOK_LIKETABLE:

Review Comment:
   Since this patch removes this LIKE TABLE support from the hcatalog (presumably because it is not used), it would be good to mention this explicitly in the commit message. 



##########
serde/src/java/org/apache/hadoop/hive/serde2/SchemaInference.java:
##########
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.serde2;
+
+import org.apache.hadoop.hive.metastore.api.FieldSchema;
+import java.util.List;
+import org.apache.hadoop.conf.Configuration;
+
+public interface SchemaInference {
+  /**
+   * Infer Hive compatible schema from provided file. The purpose of this method is to optionally
+   * allow SerDes to implement schema inference for CREATE TABLE LIKE FILE support.
+   *
+   * @param conf Hadoop Configuration
+   * @param file Fully qualified path to file to infer schema from (hadoop compatible URI + filename)
+   * @return List of FieldSchema that was derived from the provided file

Review Comment:
   nit: Add a @throw for the SerDeException



##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java:
##########
@@ -53,8 +59,33 @@ public CreateTableOperation(DDLOperationContext context, CreateTableDesc desc) {
     super(context, desc);
   }
 
+  private void readSchemaFromFile() throws HiveException {
+    String fileFormat = desc.getLikeFileFormat();
+    StorageFormatFactory storageFormatFactory = new StorageFormatFactory();
+    StorageFormatDescriptor descriptor = storageFormatFactory.get(fileFormat);
+    if (descriptor == null) {
+      // normal operation should never hit this since analysis has already verified this exists

Review Comment:
   The comment here seems to indicate this should be an assert rather than an exception. 



##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -234,4 +251,161 @@ StructTypeInfo prune() {
       return (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(newNames, newTypes);
     }
   }
+
+  // ReadSchema interface implementation
+  private String convertGroupType(GroupType group) throws SerDeException {
+    boolean first = true;
+    StringBuilder sb = new StringBuilder(serdeConstants.STRUCT_TYPE_NAME + "<");
+    for (Type field: group.getFields()) {
+      if (first) {
+        first = false;
+      } else {
+        sb.append(",");
+      }
+      // fieldName:typeName
+      sb.append(field.getName()).append(":").append(convertParquetTypeToFieldType(field));
+    }
+    sb.append(">");
+    // struct<fieldName1:int, fieldName2:map<string : int>, etc
+    return sb.toString();
+  }
+
+  private String convertPrimitiveType(PrimitiveType primitive) throws SerDeException {
+    switch (primitive.getPrimitiveTypeName()) {
+      case INT96:
+        return serdeConstants.TIMESTAMP_TYPE_NAME;
+      case INT32:
+        return serdeConstants.INT_TYPE_NAME;
+      case INT64:
+        return serdeConstants.BIGINT_TYPE_NAME;
+      case BOOLEAN:
+        return serdeConstants.BOOLEAN_TYPE_NAME;
+      case FLOAT:
+        return serdeConstants.FLOAT_TYPE_NAME;
+      case DOUBLE:
+        return serdeConstants.DOUBLE_TYPE_NAME;
+      case BINARY:
+        return serdeConstants.BINARY_TYPE_NAME;
+      default:
+        throw new SerDeException("Unhandled parquet primitive type " + primitive.getPrimitiveTypeName());
+    }
+  }
+
+  private String convertParquetIntLogicalType(Type parquetType) throws SerDeException {
+    IntLogicalTypeAnnotation intLogicalType = (IntLogicalTypeAnnotation) parquetType.getLogicalTypeAnnotation();
+    PrimitiveType primitiveType = parquetType.asPrimitiveType();
+    // check to see if primitive type handling is implemented
+    switch (primitiveType.getPrimitiveTypeName()) {
+      case INT32:
+      case INT64:
+      break;
+      default:
+        throw new SerDeException("Unhandled parquet int logical type " +  intLogicalType);
+    }
+
+    if (!intLogicalType.isSigned()) {
+      throw new SerDeException("Unhandled parquet int logical type (unsigned types are not supported) " + intLogicalType);
+    }
+
+    switch (intLogicalType.getBitWidth()) {
+      case 8: return serdeConstants.TINYINT_TYPE_NAME;
+      case 16: return serdeConstants.SMALLINT_TYPE_NAME;
+      case 32: return serdeConstants.INT_TYPE_NAME;
+      case 64: return serdeConstants.BIGINT_TYPE_NAME;
+    }
+
+    throw new SerDeException("Unhandled parquet int logical type " + intLogicalType);
+  }
+
+  private String createMapType(String keyType, String valueType) {
+    // examples: map<string, int>, map<string : struct<i : int>>
+    return serdeConstants.MAP_TYPE_NAME + "<" + keyType + "," + valueType + ">";
+  }
+
+  private String convertParquetMapLogicalTypeAnnotation(Type parquetType) throws SerDeException {
+    MapLogicalTypeAnnotation mType = (MapLogicalTypeAnnotation) parquetType.getLogicalTypeAnnotation();
+    GroupType gType = parquetType.asGroupType();
+    Type innerField = gType.getType(0);
+    GroupType innerGroup = innerField.asGroupType();
+    Type key = innerGroup.getType(0);
+    Type value = innerGroup.getType(1);
+    return createMapType(convertParquetTypeToFieldType(key), convertParquetTypeToFieldType(value));
+  }
+
+  private String createArrayType(String fieldType) {
+    // examples: array<int>, array<struct<i:int>>, array<map<string : int>>
+    return serdeConstants.LIST_TYPE_NAME + "<" + fieldType + ">";
+  }
+
+  private String convertParquetListLogicalTypeAnnotation(Type parquetType) throws SerDeException {
+    ListLogicalTypeAnnotation mType = (ListLogicalTypeAnnotation) parquetType.getLogicalTypeAnnotation();
+    GroupType gType = parquetType.asGroupType();
+    Type innerField = gType.getType(0);
+    if (innerField.isPrimitive() || innerField.getOriginalType() != null) {
+      return createArrayType(convertParquetTypeToFieldType(innerField));
+    }
+
+    GroupType innerGroup = innerField.asGroupType();
+    if (innerGroup.getFieldCount() != 1) {
+      return createArrayType(convertGroupType(innerGroup));
+    }
+
+    return createArrayType(convertParquetTypeToFieldType(innerGroup.getType(0)));
+  }
+
+  private String createDecimalType(int precision, int scale) {
+    // example: decimal(10, 4)
+    return serdeConstants.DECIMAL_TYPE_NAME + "(" + precision + "," + scale + ")";
+  }
+
+  private String convertLogicalType(Type type) throws SerDeException {
+    LogicalTypeAnnotation lType = type.getLogicalTypeAnnotation();
+    if (lType instanceof IntLogicalTypeAnnotation) {
+      return convertParquetIntLogicalType(type);
+    } else if (lType instanceof StringLogicalTypeAnnotation) {
+      return serdeConstants.STRING_TYPE_NAME;
+    } else if (lType instanceof DecimalLogicalTypeAnnotation) {
+      DecimalLogicalTypeAnnotation dType = (DecimalLogicalTypeAnnotation) lType;
+      return createDecimalType(dType.getPrecision(), dType.getScale());
+    } else if (lType instanceof MapLogicalTypeAnnotation) {
+      return convertParquetMapLogicalTypeAnnotation(type);
+    } else if (lType instanceof ListLogicalTypeAnnotation) {
+      return convertParquetListLogicalTypeAnnotation(type);
+    } else if (lType instanceof DateLogicalTypeAnnotation) {
+      // assuming 32 bit int
+      return serdeConstants.DATE_TYPE_NAME;
+    }
+    throw new SerDeException("Unhandled logical type "  + lType);
+  }
+
+  private String convertParquetTypeToFieldType(Type type) throws SerDeException {
+    if (type.getLogicalTypeAnnotation() != null) {
+      return convertLogicalType(type);
+    } else if (type.isPrimitive()) {
+      return convertPrimitiveType(type.asPrimitiveType());
+    }
+    return convertGroupType(type.asGroupType());
+  }
+
+  private FieldSchema convertParquetTypeToFieldSchema(Type type) throws SerDeException {
+    String columnName = type.getName();
+    String typeName = convertParquetTypeToFieldType(type);
+    return new FieldSchema(columnName, typeName, "Inferred from Parquet file.");
+  }
+
+  public List<FieldSchema> readSchema(Configuration conf, String file) throws SerDeException {
+      ParquetMetadata footer;
+      try {
+        footer = ParquetFileReader.readFooter(conf, new Path(file), ParquetMetadataConverter.NO_FILTER);
+      } catch (Exception e) {
+        throw new SerDeException("Failed to read parquet footer:", e);
+      }
+
+      MessageType msg = footer.getFileMetaData().getSchema();
+      List<FieldSchema> schema = new ArrayList<>();
+      for (Type field: msg.getFields()) {
+        schema.add(convertParquetTypeToFieldSchema(field));
+      }
+      return schema;

Review Comment:
   For debuggability, you may want to add a DEBUG level log message for the inferred schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r922957748


##########
ql/src/test/queries/clientpositive/create_table_like_file.q:
##########
@@ -0,0 +1,80 @@
+--! qt:dataset:src
+
+-- all primitive types
+-- timestamp_w_tz TIMESTAMP WITH LOCAL TIME ZONE is not supported by hive's parquet implementation
+CREATE EXTERNAL TABLE test_all_types(tinyint_type TINYINT, smallint_type SMALLINT, bigint_type BIGINT, int_type INT, float_type FLOAT, double_type double, decimal_type DECIMAL(4,2), timestamp_type TIMESTAMP, date_type DATE, string_type STRING, varchar_type VARCHAR(100), char_type CHAR(34), boolean_type BOOLEAN, binary_type BINARY) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_all_types';
+-- insert two rows (the other tables only have 1 row)
+INSERT INTO test_all_types VALUES (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe'),
+       (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe');
+SELECT * FROM test_all_types;
+DESCRIBE test_all_types;
+-- CREATE A LIKE table
+CREATE TABLE like_test_all_types LIKE FILE PARQUET '${system:test.tmp.dir}/test_all_types/000000_0';
+INSERT INTO like_test_all_types VALUES (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe'),
+       (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe');
+SELECT * FROM like_test_all_types;
+DESCRIBE like_test_all_types;
+DROP TABLE test_all_types;
+DROP TABLE like_test_all_types;
+
+-- complex types (struct, array, map, union)
+-- union type is not supported by PARQUET in hive
+-- array
+CREATE EXTERNAL TABLE test_array(str_array array<string>) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_array';
+DESCRIBE test_array;
+INSERT INTO test_array SELECT array("bob", "sue");
+SELECT * FROM test_array;
+CREATE TABLE like_test_array LIKE FILE PARQUET '${system:test.tmp.dir}/test_array/000000_0';
+DESCRIBE like_test_array;
+INSERT INTO like_test_array SELECT array("bob", "sue");
+SELECT * FROM like_test_array;
+DROP TABLE like_test_array;
+
+-- map
+CREATE EXTERNAL TABLE test_map(simple_map map<int, string>, map_to_struct map<string, struct<i : int>>, map_to_map map<date,map<int, string>>, map_to_array map<binary, array<array<int>>>) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_map';
+DESCRIBE test_map;
+INSERT INTO test_map SELECT map(10, "foo"), map("bar", named_struct("i", 99)), map(cast('1984-01-01' as date), map(10, "goodbye")), map(cast("binary" as binary), array(array(1,2,3)));
+SELECT * FROM test_map;
+CREATE TABLE like_test_map LIKE FILE PARQUET '${system:test.tmp.dir}/test_map/000000_0';
+DESCRIBE like_test_map;
+INSERT INTO like_test_map SELECT map(10, "foo"), map("bar", named_struct("i", 99)), map(cast('1984-01-01' as date), map(10, "goodbye")), map(cast("binary" as binary), array(array(1,2,3)));
+SELECT * FROM like_test_map;
+DROP TABLE like_test_map;
+
+-- struct
+CREATE EXTERNAL TABLE test_complex_struct(struct_type struct<tinyint_type : tinyint, smallint_type : smallint, bigint_type : bigint, int_type : int, float_type : float, double_type : double, decimal_type : DECIMAL(4,2), timestamp_type : TIMESTAMP, date_type : DATE, string_type : STRING, varchar_type : VARCHAR(100), char_type : CHAR(34), boolean_type : boolean, binary_type : binary>) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_complex_struct';
+DESCRIBE test_complex_struct;
+-- disable CBO due to the fact that type conversion causes CBO failure which causes the test to fail
+-- non-CBO path works
+SET hive.cbo.enable=false;

Review Comment:
   HIVE-26398



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r922956639


##########
ql/src/test/queries/clientpositive/create_table_like_file.q:
##########
@@ -0,0 +1,80 @@
+--! qt:dataset:src

Review Comment:
   HIVE-26398



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921942089


##########
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java:
##########
@@ -64,16 +64,13 @@ public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context,
     // Analyze and create tbl properties object
     int numCh = ast.getChildCount();
 
-    tableName = BaseSemanticAnalyzer.getUnescapedName((ASTNode) ast

Review Comment:
   Having `TOK_LIKETABLE` in the AST even if it is not a `create table like` statement does not makes sense so I like your changes in `CreateDDLParser.g`. 
   
   I'm not familiar with hcatalog. When does `CreateTableHook` called? Could you please share a test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921866506


##########
ql/src/test/queries/clientpositive/create_table_like_file.q:
##########
@@ -0,0 +1,80 @@
+--! qt:dataset:src
+
+-- all primitive types
+-- timestamp_w_tz TIMESTAMP WITH LOCAL TIME ZONE is not supported by hive's parquet implementation
+CREATE EXTERNAL TABLE test_all_types(tinyint_type TINYINT, smallint_type SMALLINT, bigint_type BIGINT, int_type INT, float_type FLOAT, double_type double, decimal_type DECIMAL(4,2), timestamp_type TIMESTAMP, date_type DATE, string_type STRING, varchar_type VARCHAR(100), char_type CHAR(34), boolean_type BOOLEAN, binary_type BINARY) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_all_types';
+-- insert two rows (the other tables only have 1 row)
+INSERT INTO test_all_types VALUES (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe'),
+       (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe');
+SELECT * FROM test_all_types;
+DESCRIBE test_all_types;
+-- CREATE A LIKE table
+CREATE TABLE like_test_all_types LIKE FILE PARQUET '${system:test.tmp.dir}/test_all_types/000000_0';
+INSERT INTO like_test_all_types VALUES (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe'),
+       (1, 2, 3, 4, 2.2, 2.2, 20.20, '2022-06-30 10:20:30', '2020-04-23', 'str1', 'varchar1', 'char', true, 'binary_maybe');
+SELECT * FROM like_test_all_types;
+DESCRIBE like_test_all_types;
+DROP TABLE test_all_types;
+DROP TABLE like_test_all_types;
+
+-- complex types (struct, array, map, union)
+-- union type is not supported by PARQUET in hive
+-- array
+CREATE EXTERNAL TABLE test_array(str_array array<string>) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_array';
+DESCRIBE test_array;
+INSERT INTO test_array SELECT array("bob", "sue");
+SELECT * FROM test_array;
+CREATE TABLE like_test_array LIKE FILE PARQUET '${system:test.tmp.dir}/test_array/000000_0';
+DESCRIBE like_test_array;
+INSERT INTO like_test_array SELECT array("bob", "sue");
+SELECT * FROM like_test_array;
+DROP TABLE like_test_array;
+
+-- map
+CREATE EXTERNAL TABLE test_map(simple_map map<int, string>, map_to_struct map<string, struct<i : int>>, map_to_map map<date,map<int, string>>, map_to_array map<binary, array<array<int>>>) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_map';
+DESCRIBE test_map;
+INSERT INTO test_map SELECT map(10, "foo"), map("bar", named_struct("i", 99)), map(cast('1984-01-01' as date), map(10, "goodbye")), map(cast("binary" as binary), array(array(1,2,3)));
+SELECT * FROM test_map;
+CREATE TABLE like_test_map LIKE FILE PARQUET '${system:test.tmp.dir}/test_map/000000_0';
+DESCRIBE like_test_map;
+INSERT INTO like_test_map SELECT map(10, "foo"), map("bar", named_struct("i", 99)), map(cast('1984-01-01' as date), map(10, "goodbye")), map(cast("binary" as binary), array(array(1,2,3)));
+SELECT * FROM like_test_map;
+DROP TABLE like_test_map;
+
+-- struct
+CREATE EXTERNAL TABLE test_complex_struct(struct_type struct<tinyint_type : tinyint, smallint_type : smallint, bigint_type : bigint, int_type : int, float_type : float, double_type : double, decimal_type : DECIMAL(4,2), timestamp_type : TIMESTAMP, date_type : DATE, string_type : STRING, varchar_type : VARCHAR(100), char_type : CHAR(34), boolean_type : boolean, binary_type : binary>) STORED AS PARQUET LOCATION '${system:test.tmp.dir}/test_complex_struct';
+DESCRIBE test_complex_struct;
+-- disable CBO due to the fact that type conversion causes CBO failure which causes the test to fail
+-- non-CBO path works
+SET hive.cbo.enable=false;

Review Comment:
   Is this a known bug? Could you please file an upstream jira to track this if not exists?



##########
ql/src/test/queries/clientpositive/create_table_like_file.q:
##########
@@ -0,0 +1,80 @@
+--! qt:dataset:src

Review Comment:
   Do you reference the `src` table from any of the statements in this q file? If not please remove this line.



##########
ql/src/test/results/clientnegative/create_table_like_invalid.q.out:
##########
@@ -0,0 +1 @@
+FAILED: SemanticException CREATE TABLE LIKE FILE is not supported by the 'AVRO' file format

Review Comment:
   Sorry, I forgot to mention earlier that creating the error message by calling `getErrorCodedMsg`
   ```
   throw new SemanticException(ErrorMsg.CTLF_UNSUPPORTED_FORMAT.getErrorCodedMsg(likeFileFormat));
   ```
   prints an error message with the error code + formatted message
   ```
   FAILED: SemanticException [Error 10434]: CREATE TABLE LIKE FILE is not supported by the 'AVRO' file format
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r922275929


##########
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java:
##########
@@ -64,16 +64,13 @@ public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context,
     // Analyze and create tbl properties object
     int numCh = ast.getChildCount();
 
-    tableName = BaseSemanticAnalyzer.getUnescapedName((ASTNode) ast

Review Comment:
   I don't think your changes in CreateTableHook.java breaks anything. LGTM.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] jfsii commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
jfsii commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921434398


##########
hcatalog/core/src/main/java/org/apache/hive/hcatalog/cli/SemanticAnalysis/CreateTableHook.java:
##########
@@ -64,16 +64,13 @@ public ASTNode preAnalyze(HiveSemanticAnalyzerHookContext context,
     // Analyze and create tbl properties object
     int numCh = ast.getChildCount();
 
-    tableName = BaseSemanticAnalyzer.getUnescapedName((ASTNode) ast

Review Comment:
   The context for the changes in this file is that I am unsure this section of code ever worked and I am unsure of the intent.
   Specifically - hcatalog tests fail because this starts throwing the exception I removed. The reason for this is because TOK_LIKETABLE was part of every single CREATE TABLE AST (you can see me remove the always TOK_LIKETABLE in my refactor of the grammar). I attempted to keep it the way it was to minimize diff and to keep logical changes together - but after a few hours of battling ANTLR I couldn't figure out a clean way to do it. (Suggestions are welcome)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] kasakrisz commented on a diff in pull request #3443: HIVE-26395: Add support for CREATE TABLE LIKE FILE PARQUET

Posted by GitBox <gi...@apache.org>.
kasakrisz commented on code in PR #3443:
URL: https://github.com/apache/hive/pull/3443#discussion_r921375990


##########
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java:
##########
@@ -234,4 +251,161 @@ StructTypeInfo prune() {
       return (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(newNames, newTypes);
     }
   }
+
+  // ReadSchema interface implementation
+  private String convertGroupType(GroupType group) throws SerDeException {
+    boolean first = true;
+    StringBuilder sb = new StringBuilder(serdeConstants.STRUCT_TYPE_NAME + "<");
+    for (Type field: group.getFields()) {
+      if (first) {
+        first = false;
+      } else {
+        sb.append(",");
+      }
+      // fieldName:typeName
+      sb.append(field.getName() + ":" + convertParquetTypeToFieldType(field));

Review Comment:
   nit.:
   ```
   sb.append(field.getName()).sb.append(":").sb.append(convertParquetTypeToFieldType(field));
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##########
@@ -13477,6 +13480,23 @@ private boolean hasConstraints(final List<FieldSchema> partCols, final List<SQLD
     }
     return false;
   }
+
+  boolean doesSupportSchemaInference(String fileFormat) throws SemanticException {
+    StorageFormatFactory storageFormatFactory = new StorageFormatFactory();
+    StorageFormatDescriptor descriptor = storageFormatFactory.get(fileFormat);
+    if (descriptor == null) {
+      throw new SemanticException("CREATE TABLE LIKE FILE is not supported by the '" + likeFileFormat + "' file format");

Review Comment:
   Could you please create error messages for this in 
   https://github.com/apache/hive/blob/c6b07f27903f7d6856a6edf1b5d756119fa522c1/common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java#L44
   String formatting is supported:
   https://github.com/apache/hive/blob/c6b07f27903f7d6856a6edf1b5d756119fa522c1/common/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java#L481
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org