You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/14 01:39:56 UTC

[GitHub] [hudi] nsivabalan commented on a change in pull request #4468: [HUDI-3130] Fixing Hive getSchema for RT tables addressing different partitions having different schemas

nsivabalan commented on a change in pull request #4468:
URL: https://github.com/apache/hudi/pull/4468#discussion_r784448964



##########
File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/AbstractRealtimeRecordReader.java
##########
@@ -77,19 +74,17 @@ private boolean usesCustomPayload() {
   }
 
   /**
-   * Goes through the log files in reverse order and finds the schema from the last available data block. If not, falls
+   * Gets schema from HoodieTableMetaClient. If not, falls
    * back to the schema from the latest parquet file. Finally, sets the partition column and projection fields into the
    * job conf.
    */
-  private void init() throws IOException {
-    Schema schemaFromLogFile = LogReaderUtils.readLatestSchemaFromLogFiles(split.getBasePath(), split.getDeltaLogFiles(), jobConf);
-    if (schemaFromLogFile == null) {
-      writerSchema = InputSplitUtils.getBaseFileSchema((FileSplit)split, jobConf);
-      LOG.info("Writer Schema From Parquet => " + writerSchema.getFields());
-    } else {
-      writerSchema = schemaFromLogFile;
-      LOG.info("Writer Schema From Log => " + writerSchema.toString(true));
-    }
+  private void init() throws Exception {
+
+    HoodieTableMetaClient metaClient = HoodieTableMetaClient.builder().setConf(split.getPath().getFileSystem(jobConf).getConf()).setBasePath(split.getBasePath()).build();
+    TableSchemaResolver schemaUtil = new TableSchemaResolver(metaClient);

Review comment:
       yes. but we also need to be mindful of accessing Metaclient from withiin log record reader. layering does not sit well. why would log record reader instantiate or use Metaclient. may be we can pass in SchemaResolver only from upper layer. I mean, cache at the table/split level and pass that into AbstractLogRecordReaders. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org