You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/10 06:55:29 UTC

[GitHub] [hudi] zhangyue19921010 opened a new pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

zhangyue19921010 opened a new pull request #4782:
URL: https://github.com/apache/hudi/pull/4782


   Please look at https://issues.apache.org/jira/browse/HUDI-3398 for details.
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   The root cause is that https://github.com/apache/hudi/pull/4649 brings a bug.
   It doesn't take care of HFile or ORC file as base data file when checking `hasOperationField`
   
   Also add a UT for this Patch.
   Without this patch, this UT will failed
   ```
   511  [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   6407 [main] WARN  org.apache.hudi.common.config.DFSPropertiesConfiguration  - Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
   6439 [main] WARN  org.apache.hudi.common.config.DFSPropertiesConfiguration  - Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
   7618 [main] WARN  org.apache.hudi.metadata.HoodieBackedTableMetadata  - Metadata table was not found at path /var/folders/61/77xdhf3x0x9g3t_vdd1c9_nwr4wznp/T/hoodie_test_path480310162801304201/.hoodie/metadata
   24864 [main] WARN  org.apache.hudi.common.table.TableSchemaResolver  - Failed to read operation field from avro schema
   java.lang.IllegalArgumentException: Unknown file format :/var/folders/61/77xdhf3x0x9g3t_vdd1c9_nwr4wznp/T/hoodie_test_path480310162801304201/.hoodie/metadata/files/files-0000_0-90-85_20220210144836531001.hfile
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:103)
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:119)
   	at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:480)
   	at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:65)
   	at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:682)
   	at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:698)
   	at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:171)
   	at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:154)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:663)
   	at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:675)
   	at org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$0(BaseHoodieWriteClient.java:270)
   	at org.apache.hudi.common.util.Option.ifPresent(Option.java:96)
   	at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:270)
   	at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:226)
   	at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:197)
   	at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:125)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:644)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:292)
   	at org.apache.hudi.TestHoodieSparkSqlWriter.testTableSchemaResolver(TestHoodieSparkSqlWriter.scala:686)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
   	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:87)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:53)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:66)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:51)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:66)
   	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:69)
   	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
   	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
   	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
   27775 [main] WARN  org.apache.hudi.common.table.TableSchemaResolver  - Failed to read operation field from avro schema
   java.lang.IllegalArgumentException: Unknown file format :/var/folders/61/77xdhf3x0x9g3t_vdd1c9_nwr4wznp/T/hoodie_test_path480310162801304201/.hoodie/metadata/files/files-0000_0-90-85_20220210144836531001.hfile
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableParquetSchemaFromDataFile(TableSchemaResolver.java:103)
   	at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:119)
   	at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:480)
   	at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:65)
   	at org.apache.hudi.TestHoodieSparkSqlWriter.testTableSchemaResolver(TestHoodieSparkSqlWriter.scala:702)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
   	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:87)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:53)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:66)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:51)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:66)
   	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:69)
   	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
   	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
   	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
   
   
   java.lang.AssertionError: assertion failed
   
   	at scala.Predef$.assert(Predef.scala:156)
   	at org.apache.hudi.TestHoodieSparkSqlWriter.testTableSchemaResolver(TestHoodieSparkSqlWriter.scala:710)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:688)
   	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
   	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
   	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
   	at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$6(TestMethodTestDescriptor.java:212)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:208)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:137)
   	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:71)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:139)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at java.util.ArrayList.forEach(ArrayList.java:1257)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:38)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$5(NodeTestTask.java:143)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$7(NodeTestTask.java:129)
   	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:127)
   	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:126)
   	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:84)
   	at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:32)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
   	at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:51)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:87)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:53)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:66)
   	at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:51)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:87)
   	at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:66)
   	at com.intellij.junit5.JUnit5IdeaTestRunner.startRunnerWithArgs(JUnit5IdeaTestRunner.java:69)
   	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
   	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
   	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
   ```
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1036948107


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   * fc91b2650562421de6a5c13e8c616e73ffcd6ac6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034681553


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034730777


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034563213


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034711976


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105993



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {
+  var spark: SparkSession = _
+  var sqlContext: SQLContext = _
+  var sc: SparkContext = _
+  var tempPath: java.nio.file.Path = _
+  var tempBootStrapPath: java.nio.file.Path = _
+  var hoodieFooTableName = "hoodie_foo_tbl"
+  var tempBasePath: String = _
+  var commonTableModifier: Map[String, String] = Map()
+  case class StringLongTest(uuid: String, ts: Long)
+
+  /**
+   * Setup method running before each test.
+   */
+  @BeforeEach
+  def setUp(): Unit = {
+    initSparkContext()
+    tempPath = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+    tempBootStrapPath = java.nio.file.Files.createTempDirectory("hoodie_test_bootstrap")
+    tempBasePath = tempPath.toAbsolutePath.toString
+    commonTableModifier = getCommonParams(tempPath, hoodieFooTableName, HoodieTableType.COPY_ON_WRITE.name())
+  }
+
+  /**
+   * Tear down method running after each test.
+   */
+  @AfterEach
+  def tearDown(): Unit = {
+    cleanupSparkContexts()
+    FileUtils.deleteDirectory(tempPath.toFile)
+    FileUtils.deleteDirectory(tempBootStrapPath.toFile)
+  }
+
+  /**
+   * Utility method for initializing the spark context.
+   */
+  def initSparkContext(): Unit = {
+    spark = SparkSession.builder()
+      .appName(hoodieFooTableName)
+      .master("local[2]")
+      .withExtensions(new HoodieSparkSessionExtension)
+      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
+      .getOrCreate()
+    sc = spark.sparkContext
+    sc.setLogLevel("ERROR")
+    sqlContext = spark.sqlContext
+  }
+
+  /**
+   * Utility method for cleaning up spark resources.
+   */
+  def cleanupSparkContexts(): Unit = {
+    if (sqlContext != null) {
+      sqlContext.clearCache();
+      sqlContext = null;
+    }
+    if (sc != null) {
+      sc.stop()
+      sc = null
+    }
+    if (spark != null) {
+      spark.close()
+    }
+  }
+
+  /**
+   * Utility method for creating common params for writer.
+   *
+   * @param path               Path for hoodie table
+   * @param hoodieFooTableName Name of hoodie table
+   * @param tableType          Type of table
+   * @return                   Map of common params
+   */
+  def getCommonParams(path: java.nio.file.Path, hoodieFooTableName: String, tableType: String): Map[String, String] = {
+    Map("path" -> path.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> hoodieFooTableName,
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.TABLE_TYPE.key -> tableType,
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator")
+  }
+
+  /**
+   * Utility method for converting list of Row to list of Seq.
+   *
+   * @param inputList list of Row
+   * @return list of Seq
+   */
+  def convertRowListToSeq(inputList: java.util.List[Row]): Seq[Row] =
+    JavaConverters.asScalaIteratorConverter(inputList.iterator).asScala.toSeq
+
+  @Test
+  def testTableSchemaResolverInMetadataTable(): Unit = {
+    val schema = DataSourceTestUtils.getStructTypeExampleSchema
+    //create a new table
+    val tableName = hoodieFooTableName
+    val fooTableModifier = Map("path" -> tempPath.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> tableName,
+      "hoodie.avro.schema" -> schema.toString(),
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator",
+      "hoodie.metadata.compact.max.delta.commits" -> "2",
+      HoodieWriteConfig.ALLOW_OPERATION_METADATA_FIELD.key -> "true"
+    )
+
+    // generate the inserts
+    val structType = AvroConversionUtils.convertAvroSchemaToStructType(schema)
+    val records = DataSourceTestUtils.generateRandomRows(10)
+    val recordsSeq = convertRowListToSeq(records)
+    val df1 = spark.createDataFrame(sc.parallelize(recordsSeq), structType)
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Overwrite, fooTableModifier, df1)
+
+    // do update
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, fooTableModifier, df1)
+
+    val metadataTablePath = tempPath.toAbsolutePath.toString + "/.hoodie/metadata"
+    val metaClient = HoodieTableMetaClient.builder()
+      .setBasePath(metadataTablePath)
+      .setConf(spark.sessionState.newHadoopConf())
+      .build()
+
+    // Delete latest metadata table deltacommit
+    // Get schema from metadata table hfile format base file.
+    val latestInstant = metaClient.getActiveTimeline.getCommitsTimeline.getReverseOrderedInstants.findFirst()
+    val path = new Path(metadataTablePath + "/.hoodie", latestInstant.get().getFileName)
+    val fs = path.getFileSystem(new Configuration())
+    fs.delete(path, false)
+    metaClient.reloadActiveTimeline()
+
+    var ori: Exception = null

Review comment:
       Just change it as `tableSchemaResolverParsingException`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105638



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {

Review comment:
       emmmm I need to use spark SQL related class and maybe can' remove this UT into hudi-common :<
   But I changed the class name as `TestTableSchemaResolverWithSparkSQL` as your suggestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r806379950



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + filePath);
+    }

Review comment:
       https://issues.apache.org/jira/browse/HUDI-3428




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua merged pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
yihua merged pull request #4782:
URL: https://github.com/apache/hudi/pull/4782


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034573420


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1036948107


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   * fc91b2650562421de6a5c13e8c616e73ffcd6ac6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua merged pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
yihua merged pull request #4782:
URL: https://github.com/apache/hudi/pull/4782


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034565025


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034684469


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105702



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.

Review comment:
       Changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r804927321



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +121,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType getMessageType(String filePath) throws IOException {

Review comment:
       nit: rename to `readSchemaFromBaseFile`?

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {
+      throw new IllegalArgumentException(
+          "Failed to read schema from data file " + hFilePath + ". File does not exist.");
+    }
+
+    CacheConfig cacheConfig = new CacheConfig(fs.getConf());
+    HoodieHFileReader<IndexedRecord> hFileReader = new HoodieHFileReader<>(fs.getConf(), hFilePath, cacheConfig);
+
+    return convertAvroSchemaToParquet(hFileReader.getSchema());
+  }
+
+
+  /**
+   * Read the parquet schema from a ORC file.
+   */
+  public MessageType readSchemaFromORCBaseFile(Path orcFilePath) throws IOException {

Review comment:
       After looking our codebase, it should be better to reuse `BaseFileUtils` or `HoodieFileReaderFactory` to abstract out the format-specific logic here.  `BaseFileUtils::readAvroSchema` can be used to read the schema.

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {
+      throw new IllegalArgumentException(
+          "Failed to read schema from data file " + hFilePath + ". File does not exist.");
+    }
+
+    CacheConfig cacheConfig = new CacheConfig(fs.getConf());
+    HoodieHFileReader<IndexedRecord> hFileReader = new HoodieHFileReader<>(fs.getConf(), hFilePath, cacheConfig);
+
+    return convertAvroSchemaToParquet(hFileReader.getSchema());

Review comment:
       @nsivabalan @manojpec Is this the most efficient way of reading the schema from a HFile?

##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.

Review comment:
       nit: fix scaladocs.

##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {

Review comment:
       put this class in the same package `org.apache.hudi.common.table`?

##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {

Review comment:
       It's good that you add functional tests for the TableSchemaResolver.  Could you annotate it as `@Tag("functional")` and rename it to `TestTableSchemaResolverWithSparkSQL`?
   
   Besides this, could you add unit tests in another java class for schema read method for different file formats?  Or do you think this is not necessary once `BaseFileUtils` is reused?

##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {
+  var spark: SparkSession = _
+  var sqlContext: SQLContext = _
+  var sc: SparkContext = _
+  var tempPath: java.nio.file.Path = _
+  var tempBootStrapPath: java.nio.file.Path = _
+  var hoodieFooTableName = "hoodie_foo_tbl"
+  var tempBasePath: String = _
+  var commonTableModifier: Map[String, String] = Map()
+  case class StringLongTest(uuid: String, ts: Long)
+
+  /**
+   * Setup method running before each test.
+   */
+  @BeforeEach
+  def setUp(): Unit = {
+    initSparkContext()
+    tempPath = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+    tempBootStrapPath = java.nio.file.Files.createTempDirectory("hoodie_test_bootstrap")
+    tempBasePath = tempPath.toAbsolutePath.toString
+    commonTableModifier = getCommonParams(tempPath, hoodieFooTableName, HoodieTableType.COPY_ON_WRITE.name())
+  }
+
+  /**
+   * Tear down method running after each test.
+   */
+  @AfterEach
+  def tearDown(): Unit = {
+    cleanupSparkContexts()
+    FileUtils.deleteDirectory(tempPath.toFile)
+    FileUtils.deleteDirectory(tempBootStrapPath.toFile)
+  }
+
+  /**
+   * Utility method for initializing the spark context.
+   */
+  def initSparkContext(): Unit = {
+    spark = SparkSession.builder()
+      .appName(hoodieFooTableName)
+      .master("local[2]")
+      .withExtensions(new HoodieSparkSessionExtension)
+      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
+      .getOrCreate()
+    sc = spark.sparkContext
+    sc.setLogLevel("ERROR")
+    sqlContext = spark.sqlContext
+  }
+
+  /**
+   * Utility method for cleaning up spark resources.
+   */
+  def cleanupSparkContexts(): Unit = {
+    if (sqlContext != null) {
+      sqlContext.clearCache();
+      sqlContext = null;
+    }
+    if (sc != null) {
+      sc.stop()
+      sc = null
+    }
+    if (spark != null) {
+      spark.close()
+    }
+  }
+
+  /**
+   * Utility method for creating common params for writer.
+   *
+   * @param path               Path for hoodie table
+   * @param hoodieFooTableName Name of hoodie table
+   * @param tableType          Type of table
+   * @return                   Map of common params
+   */
+  def getCommonParams(path: java.nio.file.Path, hoodieFooTableName: String, tableType: String): Map[String, String] = {
+    Map("path" -> path.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> hoodieFooTableName,
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.TABLE_TYPE.key -> tableType,
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator")
+  }
+
+  /**
+   * Utility method for converting list of Row to list of Seq.
+   *
+   * @param inputList list of Row
+   * @return list of Seq
+   */
+  def convertRowListToSeq(inputList: java.util.List[Row]): Seq[Row] =
+    JavaConverters.asScalaIteratorConverter(inputList.iterator).asScala.toSeq
+
+  @Test
+  def testTableSchemaResolverInMetadataTable(): Unit = {
+    val schema = DataSourceTestUtils.getStructTypeExampleSchema
+    //create a new table
+    val tableName = hoodieFooTableName
+    val fooTableModifier = Map("path" -> tempPath.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> tableName,
+      "hoodie.avro.schema" -> schema.toString(),
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator",
+      "hoodie.metadata.compact.max.delta.commits" -> "2",
+      HoodieWriteConfig.ALLOW_OPERATION_METADATA_FIELD.key -> "true"
+    )
+
+    // generate the inserts
+    val structType = AvroConversionUtils.convertAvroSchemaToStructType(schema)
+    val records = DataSourceTestUtils.generateRandomRows(10)
+    val recordsSeq = convertRowListToSeq(records)
+    val df1 = spark.createDataFrame(sc.parallelize(recordsSeq), structType)
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Overwrite, fooTableModifier, df1)
+
+    // do update
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, fooTableModifier, df1)
+
+    val metadataTablePath = tempPath.toAbsolutePath.toString + "/.hoodie/metadata"
+    val metaClient = HoodieTableMetaClient.builder()
+      .setBasePath(metadataTablePath)
+      .setConf(spark.sessionState.newHadoopConf())
+      .build()
+
+    // Delete latest metadata table deltacommit
+    // Get schema from metadata table hfile format base file.
+    val latestInstant = metaClient.getActiveTimeline.getCommitsTimeline.getReverseOrderedInstants.findFirst()
+    val path = new Path(metadataTablePath + "/.hoodie", latestInstant.get().getFileName)
+    val fs = path.getFileSystem(new Configuration())
+    fs.delete(path, false)
+    metaClient.reloadActiveTimeline()
+
+    var ori: Exception = null
+    try {
+      val schemaFromData = new TableSchemaResolver(metaClient).getTableAvroSchemaFromDataFile
+      val structFromData = AvroConversionUtils.convertAvroSchemaToStructType(HoodieAvroUtils.removeMetadataFields(schemaFromData))
+      val schemeDesign = new Schema.Parser().parse(HoodieMetadataRecord.getClassSchema.toString())
+      val structDesign = AvroConversionUtils.convertAvroSchemaToStructType(schemeDesign)
+      assertEquals(structFromData, structDesign)
+    } catch {
+      case e: Exception => ori = e;
+    }
+    assert(ori == null)
+  }
+
+  @ParameterizedTest
+  @CsvSource(Array("COPY_ON_WRITE,parquet","COPY_ON_WRITE,orc","MERGE_ON_READ,parquet","MERGE_ON_READ,orc"))

Review comment:
       add one for `MERGE_ON_READ,hfile` as well to be explicit?  I know with metadata table enabled, HFile is going to be tested, yet it's still better to have a hfile test for data table.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r806192810



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + filePath);
+    }

Review comment:
       I was more towards using the following pattern for the branch-offs so that we can avoid having specific formats here, which makes adding a new format easier (only requires changes in BaseFileUtils):
   ```
   BaseFileUtils.getInstance(filePath).readAvroSchema(conf, filePath)
   ```
   
   However, I see HFile format is not included in `BaseFileUtils`.  Maybe this is fine now.  @zhangyue19921010 could you add a ticket for fixing that as a follow-up?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r806336043



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + filePath);
+    }

Review comment:
       Okay, will create a related tickets to unify BaseFileUtils and also improve UTs ASAP.
   Thanks a lot for your review :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034696521


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034693367


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034605357


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r804966522



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {
+  var spark: SparkSession = _
+  var sqlContext: SQLContext = _
+  var sc: SparkContext = _
+  var tempPath: java.nio.file.Path = _
+  var tempBootStrapPath: java.nio.file.Path = _
+  var hoodieFooTableName = "hoodie_foo_tbl"
+  var tempBasePath: String = _
+  var commonTableModifier: Map[String, String] = Map()
+  case class StringLongTest(uuid: String, ts: Long)
+
+  /**
+   * Setup method running before each test.
+   */
+  @BeforeEach
+  def setUp(): Unit = {
+    initSparkContext()
+    tempPath = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+    tempBootStrapPath = java.nio.file.Files.createTempDirectory("hoodie_test_bootstrap")
+    tempBasePath = tempPath.toAbsolutePath.toString
+    commonTableModifier = getCommonParams(tempPath, hoodieFooTableName, HoodieTableType.COPY_ON_WRITE.name())
+  }
+
+  /**
+   * Tear down method running after each test.
+   */
+  @AfterEach
+  def tearDown(): Unit = {
+    cleanupSparkContexts()
+    FileUtils.deleteDirectory(tempPath.toFile)
+    FileUtils.deleteDirectory(tempBootStrapPath.toFile)
+  }
+
+  /**
+   * Utility method for initializing the spark context.
+   */
+  def initSparkContext(): Unit = {
+    spark = SparkSession.builder()
+      .appName(hoodieFooTableName)
+      .master("local[2]")
+      .withExtensions(new HoodieSparkSessionExtension)
+      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
+      .getOrCreate()
+    sc = spark.sparkContext
+    sc.setLogLevel("ERROR")
+    sqlContext = spark.sqlContext
+  }
+
+  /**
+   * Utility method for cleaning up spark resources.
+   */
+  def cleanupSparkContexts(): Unit = {
+    if (sqlContext != null) {
+      sqlContext.clearCache();
+      sqlContext = null;
+    }
+    if (sc != null) {
+      sc.stop()
+      sc = null
+    }
+    if (spark != null) {
+      spark.close()
+    }
+  }
+
+  /**
+   * Utility method for creating common params for writer.
+   *
+   * @param path               Path for hoodie table
+   * @param hoodieFooTableName Name of hoodie table
+   * @param tableType          Type of table
+   * @return                   Map of common params
+   */
+  def getCommonParams(path: java.nio.file.Path, hoodieFooTableName: String, tableType: String): Map[String, String] = {
+    Map("path" -> path.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> hoodieFooTableName,
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.TABLE_TYPE.key -> tableType,
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator")
+  }
+
+  /**
+   * Utility method for converting list of Row to list of Seq.
+   *
+   * @param inputList list of Row
+   * @return list of Seq
+   */
+  def convertRowListToSeq(inputList: java.util.List[Row]): Seq[Row] =
+    JavaConverters.asScalaIteratorConverter(inputList.iterator).asScala.toSeq
+
+  @Test
+  def testTableSchemaResolverInMetadataTable(): Unit = {
+    val schema = DataSourceTestUtils.getStructTypeExampleSchema
+    //create a new table
+    val tableName = hoodieFooTableName
+    val fooTableModifier = Map("path" -> tempPath.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> tableName,
+      "hoodie.avro.schema" -> schema.toString(),
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator",
+      "hoodie.metadata.compact.max.delta.commits" -> "2",
+      HoodieWriteConfig.ALLOW_OPERATION_METADATA_FIELD.key -> "true"
+    )
+
+    // generate the inserts
+    val structType = AvroConversionUtils.convertAvroSchemaToStructType(schema)
+    val records = DataSourceTestUtils.generateRandomRows(10)
+    val recordsSeq = convertRowListToSeq(records)
+    val df1 = spark.createDataFrame(sc.parallelize(recordsSeq), structType)
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Overwrite, fooTableModifier, df1)
+
+    // do update
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, fooTableModifier, df1)
+
+    val metadataTablePath = tempPath.toAbsolutePath.toString + "/.hoodie/metadata"
+    val metaClient = HoodieTableMetaClient.builder()
+      .setBasePath(metadataTablePath)
+      .setConf(spark.sessionState.newHadoopConf())
+      .build()
+
+    // Delete latest metadata table deltacommit
+    // Get schema from metadata table hfile format base file.
+    val latestInstant = metaClient.getActiveTimeline.getCommitsTimeline.getReverseOrderedInstants.findFirst()
+    val path = new Path(metadataTablePath + "/.hoodie", latestInstant.get().getFileName)
+    val fs = path.getFileSystem(new Configuration())
+    fs.delete(path, false)
+    metaClient.reloadActiveTimeline()
+
+    var ori: Exception = null

Review comment:
       sorry, what "ori". can we name it appropriately 

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {
+      throw new IllegalArgumentException(
+          "Failed to read schema from data file " + hFilePath + ". File does not exist.");
+    }
+
+    CacheConfig cacheConfig = new CacheConfig(fs.getConf());
+    HoodieHFileReader<IndexedRecord> hFileReader = new HoodieHFileReader<>(fs.getConf(), hFilePath, cacheConfig);
+
+    return convertAvroSchemaToParquet(hFileReader.getSchema());

Review comment:
       yeah. can't think of anything better. may be we can just in empty CacheConfig or setting everything to null. for eg, guess we have a config for prefetching, etc. since we are interested only in schema, we can disable all caching. 
   

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {

Review comment:
       lets avoid fs.exists(). we already got the file from commit Metadata. so, its safe to assume its valid file. we wanted to avoid direct fs calls if we can in general. 
   

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {
+      throw new IllegalArgumentException(
+          "Failed to read schema from data file " + hFilePath + ". File does not exist.");
+    }
+
+    CacheConfig cacheConfig = new CacheConfig(fs.getConf());
+    HoodieHFileReader<IndexedRecord> hFileReader = new HoodieHFileReader<>(fs.getConf(), hFilePath, cacheConfig);
+
+    return convertAvroSchemaToParquet(hFileReader.getSchema());
+  }
+
+
+  /**
+   * Read the parquet schema from a ORC file.
+   */
+  public MessageType readSchemaFromORCBaseFile(Path orcFilePath) throws IOException {

Review comment:
       +1 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1039342769


   hey folks. This is blocking a [patch](https://github.com/apache/hudi/pull/4811) of mine. Can we prioritize this and get this landed. thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034573420


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034569940


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034561491


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034561491


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034788977


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105923



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {

Review comment:
       Okay, changed all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805104973



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +121,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType getMessageType(String filePath) throws IOException {

Review comment:
       Okay changed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1036949848


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5934",
       "triggerID" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   * fc91b2650562421de6a5c13e8c616e73ffcd6ac6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5934) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105044



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -422,6 +443,55 @@ public MessageType readSchemaFromBaseFile(Path parquetFilePath) throws IOExcepti
     return fileFooter.getFileMetaData().getSchema();
   }
 
+  /**
+   * Read the parquet schema from a HFile.
+   */
+  public MessageType readSchemaFromHFileBaseFile(Path hFilePath) throws IOException {
+    LOG.info("Reading schema from " + hFilePath);
+
+    FileSystem fs = metaClient.getRawFs();
+    if (!fs.exists(hFilePath)) {
+      throw new IllegalArgumentException(
+          "Failed to read schema from data file " + hFilePath + ". File does not exist.");
+    }
+
+    CacheConfig cacheConfig = new CacheConfig(fs.getConf());
+    HoodieHFileReader<IndexedRecord> hFileReader = new HoodieHFileReader<>(fs.getConf(), hFilePath, cacheConfig);
+
+    return convertAvroSchemaToParquet(hFileReader.getSchema());
+  }
+
+
+  /**
+   * Read the parquet schema from a ORC file.
+   */
+  public MessageType readSchemaFromORCBaseFile(Path orcFilePath) throws IOException {

Review comment:
       Nice idea. Changed!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034730777


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034788977


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034577123


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034605357


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034681553


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034711976


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034696521


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034569940


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034563213


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
yihua commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034563630


   CC @nsivabalan @manojpec @xushiyan @XuQianJin-Stars 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1036949848


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5934",
       "triggerID" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * c2786543e936c04c9efa747eca059a3acbb3a3ee Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871) 
   * fc91b2650562421de6a5c13e8c616e73ffcd6ac6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5934) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1036974958


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d0523c6e103dea6aa39dd4d487d086fc5344db83",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5871",
       "triggerID" : "c2786543e936c04c9efa747eca059a3acbb3a3ee",
       "triggerType" : "PUSH"
     }, {
       "hash" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5934",
       "triggerID" : "fc91b2650562421de6a5c13e8c616e73ffcd6ac6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   * d0523c6e103dea6aa39dd4d487d086fc5344db83 UNKNOWN
   * fc91b2650562421de6a5c13e8c616e73ffcd6ac6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5934) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1039342769


   hey folks. This is blocking a [patch](https://github.com/apache/hudi/pull/4811) of mine. Can we prioritize this and get this landed. thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r806192810



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + filePath);
+    }

Review comment:
       I was more towards using the following pattern for the branch-offs so that we can avoid having specific formats here, which makes adding a new format easier (only requires changes in BaseFileUtils):
   ```
   BaseFileUtils.getInstance(filePath).readAvroSchema(conf, filePath)
   ```
   
   However, I see HFile format is not included in `BaseFileUtils`.  Maybe this is fine now.  @zhangyue19921010 could you add a ticket for fixing that as a follow-up?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r806336043



##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + filePath);
+    }

Review comment:
       Okay, will create a related tickets to unify BaseFileUtils and also improve UTs ASAP.
   Thanks a lot for your review :)

##########
File path: hudi-common/src/main/java/org/apache/hudi/common/table/TableSchemaResolver.java
##########
@@ -115,6 +117,21 @@ private MessageType getTableParquetSchemaFromDataFile() {
     }
   }
 
+  private MessageType readSchemaFromBaseFile(String filePath) throws IOException {
+    if (filePath.contains(HoodieFileFormat.PARQUET.getFileExtension())) {
+      // this is a parquet file
+      return readSchemaFromParquetBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.HFILE.getFileExtension())) {
+      // this is a HFile
+      return readSchemaFromHFileBaseFile(new Path(filePath));
+    } else if (filePath.contains(HoodieFileFormat.ORC.getFileExtension())) {
+      // this is a ORC file
+      return readSchemaFromORCBaseFile(new Path(filePath));
+    } else {
+      throw new IllegalArgumentException("Unknown base file format :" + filePath);
+    }

Review comment:
       https://issues.apache.org/jira/browse/HUDI-3428




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034684469


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034565025


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034577123


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866) 
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#issuecomment-1034693367


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d3f4e79167a0d44808b733cb9006632ede91ccc7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5866",
       "triggerID" : "7ef305b4e5f5d5e35841c3e1e3b0aa0730742c09",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7afeafeb3a7f11cbcec83fba0eb2db8192db79a7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869",
       "triggerID" : "55db5f1c4421c0b8a3de30264e616802eaaa11db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c0d8f22bba21151f167e889a48087aad6bebbc0d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d3f4e79167a0d44808b733cb9006632ede91ccc7 UNKNOWN
   * 7afeafeb3a7f11cbcec83fba0eb2db8192db79a7 UNKNOWN
   * 55db5f1c4421c0b8a3de30264e616802eaaa11db Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5869) 
   * e1a13ccdd713d2bfa19d9deb9e9d8e1b5d55271e UNKNOWN
   * c0d8f22bba21151f167e889a48087aad6bebbc0d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105681



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {
+  var spark: SparkSession = _
+  var sqlContext: SQLContext = _
+  var sc: SparkContext = _
+  var tempPath: java.nio.file.Path = _
+  var tempBootStrapPath: java.nio.file.Path = _
+  var hoodieFooTableName = "hoodie_foo_tbl"
+  var tempBasePath: String = _
+  var commonTableModifier: Map[String, String] = Map()
+  case class StringLongTest(uuid: String, ts: Long)
+
+  /**
+   * Setup method running before each test.
+   */
+  @BeforeEach
+  def setUp(): Unit = {
+    initSparkContext()
+    tempPath = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+    tempBootStrapPath = java.nio.file.Files.createTempDirectory("hoodie_test_bootstrap")
+    tempBasePath = tempPath.toAbsolutePath.toString
+    commonTableModifier = getCommonParams(tempPath, hoodieFooTableName, HoodieTableType.COPY_ON_WRITE.name())
+  }
+
+  /**
+   * Tear down method running after each test.
+   */
+  @AfterEach
+  def tearDown(): Unit = {
+    cleanupSparkContexts()
+    FileUtils.deleteDirectory(tempPath.toFile)
+    FileUtils.deleteDirectory(tempBootStrapPath.toFile)
+  }
+
+  /**
+   * Utility method for initializing the spark context.
+   */
+  def initSparkContext(): Unit = {
+    spark = SparkSession.builder()
+      .appName(hoodieFooTableName)
+      .master("local[2]")
+      .withExtensions(new HoodieSparkSessionExtension)
+      .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
+      .getOrCreate()
+    sc = spark.sparkContext
+    sc.setLogLevel("ERROR")
+    sqlContext = spark.sqlContext
+  }
+
+  /**
+   * Utility method for cleaning up spark resources.
+   */
+  def cleanupSparkContexts(): Unit = {
+    if (sqlContext != null) {
+      sqlContext.clearCache();
+      sqlContext = null;
+    }
+    if (sc != null) {
+      sc.stop()
+      sc = null
+    }
+    if (spark != null) {
+      spark.close()
+    }
+  }
+
+  /**
+   * Utility method for creating common params for writer.
+   *
+   * @param path               Path for hoodie table
+   * @param hoodieFooTableName Name of hoodie table
+   * @param tableType          Type of table
+   * @return                   Map of common params
+   */
+  def getCommonParams(path: java.nio.file.Path, hoodieFooTableName: String, tableType: String): Map[String, String] = {
+    Map("path" -> path.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> hoodieFooTableName,
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.TABLE_TYPE.key -> tableType,
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator")
+  }
+
+  /**
+   * Utility method for converting list of Row to list of Seq.
+   *
+   * @param inputList list of Row
+   * @return list of Seq
+   */
+  def convertRowListToSeq(inputList: java.util.List[Row]): Seq[Row] =
+    JavaConverters.asScalaIteratorConverter(inputList.iterator).asScala.toSeq
+
+  @Test
+  def testTableSchemaResolverInMetadataTable(): Unit = {
+    val schema = DataSourceTestUtils.getStructTypeExampleSchema
+    //create a new table
+    val tableName = hoodieFooTableName
+    val fooTableModifier = Map("path" -> tempPath.toAbsolutePath.toString,
+      HoodieWriteConfig.TBL_NAME.key -> tableName,
+      "hoodie.avro.schema" -> schema.toString(),
+      "hoodie.insert.shuffle.parallelism" -> "1",
+      "hoodie.upsert.shuffle.parallelism" -> "1",
+      DataSourceWriteOptions.RECORDKEY_FIELD.key -> "_row_key",
+      DataSourceWriteOptions.PARTITIONPATH_FIELD.key -> "partition",
+      DataSourceWriteOptions.KEYGENERATOR_CLASS_NAME.key -> "org.apache.hudi.keygen.SimpleKeyGenerator",
+      "hoodie.metadata.compact.max.delta.commits" -> "2",
+      HoodieWriteConfig.ALLOW_OPERATION_METADATA_FIELD.key -> "true"
+    )
+
+    // generate the inserts
+    val structType = AvroConversionUtils.convertAvroSchemaToStructType(schema)
+    val records = DataSourceTestUtils.generateRandomRows(10)
+    val recordsSeq = convertRowListToSeq(records)
+    val df1 = spark.createDataFrame(sc.parallelize(recordsSeq), structType)
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Overwrite, fooTableModifier, df1)
+
+    // do update
+    HoodieSparkSqlWriter.write(sqlContext, SaveMode.Append, fooTableModifier, df1)
+
+    val metadataTablePath = tempPath.toAbsolutePath.toString + "/.hoodie/metadata"
+    val metaClient = HoodieTableMetaClient.builder()
+      .setBasePath(metadataTablePath)
+      .setConf(spark.sessionState.newHadoopConf())
+      .build()
+
+    // Delete latest metadata table deltacommit
+    // Get schema from metadata table hfile format base file.
+    val latestInstant = metaClient.getActiveTimeline.getCommitsTimeline.getReverseOrderedInstants.findFirst()
+    val path = new Path(metadataTablePath + "/.hoodie", latestInstant.get().getFileName)
+    val fs = path.getFileSystem(new Configuration())
+    fs.delete(path, false)
+    metaClient.reloadActiveTimeline()
+
+    var ori: Exception = null
+    try {
+      val schemaFromData = new TableSchemaResolver(metaClient).getTableAvroSchemaFromDataFile
+      val structFromData = AvroConversionUtils.convertAvroSchemaToStructType(HoodieAvroUtils.removeMetadataFields(schemaFromData))
+      val schemeDesign = new Schema.Parser().parse(HoodieMetadataRecord.getClassSchema.toString())
+      val structDesign = AvroConversionUtils.convertAvroSchemaToStructType(schemeDesign)
+      assertEquals(structFromData, structDesign)
+    } catch {
+      case e: Exception => ori = e;
+    }
+    assert(ori == null)
+  }
+
+  @ParameterizedTest
+  @CsvSource(Array("COPY_ON_WRITE,parquet","COPY_ON_WRITE,orc","MERGE_ON_READ,parquet","MERGE_ON_READ,orc"))

Review comment:
       Sure, add `MERGE_ON_READ,hfile` and `COPY_ON_WRITE,hfile`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] zhangyue19921010 commented on a change in pull request #4782: [HUDI-3398] TableSchemaResolver may fail for metadata table

Posted by GitBox <gi...@apache.org>.
zhangyue19921010 commented on a change in pull request #4782:
URL: https://github.com/apache/hudi/pull/4782#discussion_r805105894



##########
File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestTableSchemaResolver.scala
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi
+
+import org.apache.avro.Schema
+import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hudi.avro.HoodieAvroUtils
+import org.apache.hudi.avro.model.HoodieMetadataRecord
+import org.apache.hudi.common.model._
+import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver}
+import org.apache.hudi.config.HoodieWriteConfig
+import org.apache.hudi.testutils.DataSourceTestUtils
+import org.apache.spark.SparkContext
+import org.apache.spark.sql._
+import org.apache.spark.sql.hudi.HoodieSparkSessionExtension
+import org.junit.jupiter.api.Assertions.{assertEquals, assertTrue}
+import org.junit.jupiter.api.{AfterEach, BeforeEach, Test}
+import org.junit.jupiter.params.ParameterizedTest
+import org.junit.jupiter.params.provider.CsvSource
+
+import scala.collection.JavaConverters
+
+/**
+ * Test suite for SparkSqlWriter class.
+ */
+class TestTableSchemaResolver {

Review comment:
       Sure thing, changed. Also we can add a new def named `schemaValuationBasedOnDataFile` which will call `getTableAvroSchemaFromDataFile` api to read schema from data file and do schema valuation.
   And I believe this can cover `schema read method for different file formats` :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org