You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "slfan1989 (via GitHub)" <gi...@apache.org> on 2023/04/19 00:47:57 UTC

[GitHub] [hudi] slfan1989 commented on a diff in pull request #8478: [HUDI-6086] Improve HiveSchemaUtil#generateCreateDDL With ST

slfan1989 commented on code in PR #8478:
URL: https://github.com/apache/hudi/pull/8478#discussion_r1170695860


##########
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java:
##########
@@ -65,6 +67,34 @@ public class HiveSchemaUtil {
   public static final String BINARY_TYPE_NAME = "binary";
   public static final String DATE_TYPE_NAME = "date";
 
+  private static final String DATABASE_NAME = "DATABASE_NAME";
+  private static final String EXTERNAL = "external";
+  private static final String TABLE_NAME = "TABLE_NAME";
+  private static final String LIST_COLUMNS = "columns";
+  private static final String PARTITIONS = "partitions";
+  private static final String ROW_FORMAT = "row_format";
+  private static final String LOCATION = "location";
+  private static final String LOCATION_BLOCK = "location_block";
+  private static final String PROPERTIES = "properties";
+  private static final String BUCKETS = "buckets";
+
+  private static final String CREATE_DATABASE_STMT =
+      "CREATE DATABASE IF NOT EXISTS <" + DATABASE_NAME + ">";
+
+  private static final String CREATE_TABLE_TEMPLATE =
+      "CREATE <" + EXTERNAL + ">TABLE <if(" + DATABASE_NAME + ")>`<" + DATABASE_NAME + ">`.<endif>"
+      + "`<" + TABLE_NAME + ">`(\n"
+      + "<" + LIST_COLUMNS + ">)\n"
+      + "<" + PARTITIONS + ">\n"
+      + "<" + BUCKETS + ">\n"
+      + "<" + ROW_FORMAT + ">\n"
+      + "<" + LOCATION_BLOCK + ">"
+      + "TBLPROPERTIES (\n"
+      + "<" + PROPERTIES + ">)";

Review Comment:
   @danny0405 Thank you very much for helping review the code!
   
   1. Antlr was introduced into the Hudi project in [HUDI-4111] Bump ANTLR runtime version in Spark 3.x (#5606), we can find the reference of antlr in the parent pom.xml
   ```
   <antlr.version>4.8</antlr.version>
   ```
   `hudi-hive-sync` references `hive-exec`, so we can use `antlr` directly.
   
   2. Thank you for your question! I agree with your point of view, in some scenarios, the antlr grammar template is not that straight-forward. But for generating sql, I think we can use it, because the template can better describe the components of sql.
   
   HiveSchemaUtil#CREATE_TABLE_TEMPLATE
   ```
   private static final String CREATE_TABLE_TEMPLATE =
         "CREATE <" + EXTERNAL + ">TABLE <if(" + DATABASE_NAME + ")>`<" + DATABASE_NAME + ">`.<endif>"
         + "`<" + TABLE_NAME + ">`(\n"
         + "<" + LIST_COLUMNS + ">)\n"
         + "<" + PARTITIONS + ">\n"
         + "<" + BUCKETS + ">\n"
         + "<" + ROW_FORMAT + ">\n"
         + "<" + LOCATION_BLOCK + ">"
         + "TBLPROPERTIES (\n"
         + "<" + PROPERTIES + ">)";
   ```
   
   Through this template, we can know the elements of the sql to be generated, such as the name of the table, the name of the database, and various attributes of the table. It is easier to read this part of the code.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org