You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/25 06:47:20 UTC

[GitHub] [hudi] waywtdcc opened a new pull request, #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

waywtdcc opened a new pull request, #7056:
URL: https://github.com/apache/hudi/pull/7056

   ### Change Logs
   
   Fix bug:Failed to synchronize the hive metadata of the Flink table
   
   ### Impact
   
   Fix bug:Failed to synchronize the hive metadata of the Flink table
   
   ### Risk level (write none, low medium or high below)
   
   
   
   ### Documentation Update
   
   [https://issues.apache.org/jira/browse/HUDI-5088](issue)
   
   ### Contributor's checklist
   
   - [y] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [y] Change Logs and Impact were stated clearly
   - [y] Adequate tests were added if applicable
   - [y] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1296599163

   > [5088.patch.zip](https://github.com/apache/hudi/files/9876623/5088.patch.zip) Thanks for the fix, i have reviewed and applied a patch ~
       boolean withOperationField = Boolean.parseBoolean(table.getOptions().getOrDefault(FlinkOptions.CHANGELOG_ENABLED.key(), "false"));
   
   Thank you for your reviews. But I think "false" should not be written dead, and the default value should be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1290143445

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab4622b75c7c5adb5a9d225dacc73b49ac6336f Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1296689157

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab4622b75c7c5adb5a9d225dacc73b49ac6336f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562) 
   * 04cc5970603743e3c0eb7d25767195dc2f267152 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1296597975

   > [5088.patch.zip](https://github.com/apache/hudi/files/9876623/5088.patch.zip) Thanks for the fix, i have reviewed and applied a patch ~
   
   Boolean.parseBoolean(table.getOptions().getOrDefault(FlinkOptions.CHANGELOG_ENABLED.key(), "false")); 
   Thank you for your reviews. But I think "false" should not be written dead, and the default value should be used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1009099027


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -799,7 +800,7 @@ public void dropPartition(
     try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(tablePath, table)) {
       boolean hiveStylePartitioning = Boolean.parseBoolean(table.getOptions().get(FlinkOptions.HIVE_STYLE_PARTITIONING.key()));
       writeClient.deletePartitions(
-          Collections.singletonList(HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, partitionSpec)),
+              Collections.singletonList(HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, partitionSpec)),
               HoodieActiveTimeline.createNewInstantTime())

Review Comment:
   Unnecessary change.



##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -546,7 +546,8 @@ private Table instantiateHiveTable(ObjectPath tablePath, CatalogBaseTable table,
     // because since Hive 3.x, there is validation when altering table,
     // when the metadata fields are synced through the hive sync tool,
     // a compatability issue would be reported.
-    List<FieldSchema> allColumns = HiveSchemaUtils.toHiveFieldSchema(table.getSchema());
+    boolean withOperationField = Configuration.fromMap(table.getOptions()).getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    List<FieldSchema> allColumns = HiveSchemaUtils.toHiveFieldSchema(table.getSchema(), withOperationField);

Review Comment:
   `Configuration.fromMap(table.getOptions())` is a heavy operation, we should avoid that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1297021020

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675",
       "triggerID" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f52a3126d7112279ba3f5643b8e525617c8fbe7",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12679",
       "triggerID" : "7f52a3126d7112279ba3f5643b8e525617c8fbe7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7f52a3126d7112279ba3f5643b8e525617c8fbe7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12679) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1006426816


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   ![image](https://user-images.githubusercontent.com/59957056/198202085-65569dd0-1abe-44a3-bc8a-f330d0174f88.png)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1009102237


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java:
##########
@@ -799,7 +800,7 @@ public void dropPartition(
     try (HoodieFlinkWriteClient<?> writeClient = createWriteClient(tablePath, table)) {
       boolean hiveStylePartitioning = Boolean.parseBoolean(table.getOptions().get(FlinkOptions.HIVE_STYLE_PARTITIONING.key()));
       writeClient.deletePartitions(
-          Collections.singletonList(HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, partitionSpec)),
+              Collections.singletonList(HoodieCatalogUtil.inferPartitionPath(hiveStylePartitioning, partitionSpec)),
               HoodieActiveTimeline.createNewInstantTime())

Review Comment:
   This is the format of the previous code. I corrected it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1296766506

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675",
       "triggerID" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f52a3126d7112279ba3f5643b8e525617c8fbe7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7f52a3126d7112279ba3f5643b8e525617c8fbe7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab4622b75c7c5adb5a9d225dacc73b49ac6336f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562) 
   * 04cc5970603743e3c0eb7d25767195dc2f267152 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675) 
   * 7f52a3126d7112279ba3f5643b8e525617c8fbe7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1006361456


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   > What version of hudi did you use for the streaming writer ?
   
   0.12.1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1006366625


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   Can you past the error stack trace here ? The writer does not expect to sync the `_hoodie_operation` meta field now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1290136714

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab4622b75c7c5adb5a9d225dacc73b49ac6336f UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1296696096

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675",
       "triggerID" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab4622b75c7c5adb5a9d225dacc73b49ac6336f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562) 
   * 04cc5970603743e3c0eb7d25767195dc2f267152 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1006430155


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   This should be because the operation field exists in the parquet file and is not filtered out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1006339111


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   What version of hudi did you use for the streaming writer ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1004079499


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   In current master, we do not add _hoodie_operation field for hive table now, how the hive table was created locally ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1291126742

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * aab4622b75c7c5adb5a9d225dacc73b49ac6336f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1006335372


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   The `_hoodie_operation` field is useless for hive table, we should not sync it. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1293092786

   [5088.patch.zip](https://github.com/apache/hudi/files/9876623/5088.patch.zip)
   Thanks for the fix, i have reviewed and applied a patch ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #7056:
URL: https://github.com/apache/hudi/pull/7056#issuecomment-1296772716

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12562",
       "triggerID" : "aab4622b75c7c5adb5a9d225dacc73b49ac6336f",
       "triggerType" : "PUSH"
     }, {
       "hash" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675",
       "triggerID" : "04cc5970603743e3c0eb7d25767195dc2f267152",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7f52a3126d7112279ba3f5643b8e525617c8fbe7",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12679",
       "triggerID" : "7f52a3126d7112279ba3f5643b8e525617c8fbe7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 04cc5970603743e3c0eb7d25767195dc2f267152 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12675) 
   * 7f52a3126d7112279ba3f5643b8e525617c8fbe7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12679) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 merged pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
danny0405 merged PR #7056:
URL: https://github.com/apache/hudi/pull/7056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] waywtdcc commented on a diff in pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

Posted by GitBox <gi...@apache.org>.
waywtdcc commented on code in PR #7056:
URL: https://github.com/apache/hudi/pull/7056#discussion_r1004081118


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HiveSchemaUtils.java:
##########
@@ -177,10 +180,19 @@ private static DataType toFlinkPrimitiveType(PrimitiveTypeInfo hiveType) {
 
   /**
    * Create Hive field schemas from Flink table schema including the hoodie metadata fields.
+   *
+   * @param table
    */
-  public static List<FieldSchema> toHiveFieldSchema(TableSchema schema) {
+  public static List<FieldSchema> toHiveFieldSchema(CatalogBaseTable table) {
+    TableSchema schema = table.getSchema();
+    Configuration configuration = Configuration.fromMap(table.getOptions());
+    Boolean changelogEnable = configuration.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    Collection<String> hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS;
+    if (changelogEnable) {
+      hoodieMetaColumns = HoodieRecord.HOODIE_META_COLUMNS_WITH_OPERATION;

Review Comment:
   specifying changlog.enabled=true and hive_sync.skip_ro_suffix = true, and Hive sync automatically synchronizes metadata



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org