You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/02 16:07:14 UTC

[GitHub] [hudi] nsivabalan commented on a change in pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

nsivabalan commented on a change in pull request #2334:
URL: https://github.com/apache/hudi/pull/2334#discussion_r550891438



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java
##########
@@ -152,9 +155,9 @@ public void write() {
         final String key = keyIterator.next();
         HoodieRecord<T> record = recordMap.get(key);
         if (useWriterSchema) {
-          write(record, record.getData().getInsertValue(writerSchemaWithMetafields));
+          write(record, record.getData().getInsertValue(inputSchemaWithMetaFields));

Review comment:
       minor. not exactly related to this patch. But can we rename "useWriterSchema" to something like "isCompactor" so that its explicit and comprehensible. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -53,47 +52,70 @@
 
   private static final Logger LOG = LogManager.getLogger(HoodieWriteHandle.class);
 
-  protected final Schema writerSchema;
-  protected final Schema writerSchemaWithMetafields;
+  /**
+   * The table schema is the schema of the table which used to read/write record from table.
+   */
+  protected final Schema tableSchema;
+  /**
+   * The table schema with meta fields.
+   */
+  protected final Schema tableSchemaWithMetaFields;
+  /**
+   * The input schema is the input data schema which used to parse data from incoming record.
+   */
+  protected final Schema inputSchema;
+  /**
+   * The input shema with meta fields.

Review comment:
       minor. typo. "schema"

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java
##########
@@ -574,9 +574,8 @@ private void validateSchema() throws HoodieUpsertException, HoodieInsertExceptio
     Schema writerSchema;
     boolean isValid;
     try {
-      TableSchemaResolver schemaUtil = new TableSchemaResolver(getMetaClient());
       writerSchema = HoodieAvroUtils.createHoodieWriteSchema(config.getSchema());
-      tableSchema = HoodieAvroUtils.createHoodieWriteSchema(schemaUtil.getTableAvroSchemaWithoutMetadataFields());
+      tableSchema = getTableSchema(config, true);

Review comment:
       isn't the 2nd arg supposed to be false? 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieBootstrapHandle.java
##########
@@ -38,8 +38,7 @@
   public HoodieBootstrapHandle(HoodieWriteConfig config, String commitTime, HoodieTable<T, I, K, O> hoodieTable,
       String partitionPath, String fileId, TaskContextSupplier taskContextSupplier) {
     super(config, commitTime, hoodieTable, partitionPath, fileId,
-        Pair.of(HoodieAvroUtils.RECORD_KEY_SCHEMA,
-            HoodieAvroUtils.addMetadataFields(HoodieAvroUtils.RECORD_KEY_SCHEMA)), taskContextSupplier);

Review comment:
       @bvaradar : can you confirm that this change looks good. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
##########
@@ -93,18 +93,14 @@ public HoodieMergeHandle(HoodieWriteConfig config, String instantTime, HoodieTab
     init(fileId, this.partitionPath, dataFileToBeMerged);
   }
 
-  @Override
-  public Schema getWriterSchemaWithMetafields() {
-    return writerSchemaWithMetafields;
-  }
-
   public Schema getWriterSchema() {
-    return writerSchema;
+    return tableSchema;

Review comment:
       Feel we could rename this method to getReaderSchema() since this is used only at one place 
   ```
   bootstrapReadSchema = mergeHandle.getWriterSchema();
   ```
   And we assign it to a var to store read schema anyways. 

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -53,47 +52,70 @@
 
   private static final Logger LOG = LogManager.getLogger(HoodieWriteHandle.class);
 
-  protected final Schema writerSchema;
-  protected final Schema writerSchemaWithMetafields;
+  /**
+   * The table schema is the schema of the table which used to read/write record from table.
+   */
+  protected final Schema tableSchema;

Review comment:
       thanks for this change. Definitely looks more readable now and avoids any confusion

##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java
##########
@@ -53,47 +52,70 @@
 
   private static final Logger LOG = LogManager.getLogger(HoodieWriteHandle.class);
 
-  protected final Schema writerSchema;
-  protected final Schema writerSchemaWithMetafields;
+  /**
+   * The table schema is the schema of the table which used to read/write record from table.
+   */
+  protected final Schema tableSchema;
+  /**
+   * The table schema with meta fields.
+   */
+  protected final Schema tableSchemaWithMetaFields;
+  /**
+   * The input schema is the input data schema which used to parse data from incoming record.
+   */
+  protected final Schema inputSchema;
+  /**
+   * The input shema with meta fields.
+   */
+  protected final Schema inputSchemaWithMetaFields;
+
   protected HoodieTimer timer;
   protected final WriteStatus writeStatus;
   protected final String partitionPath;
   protected final String fileId;
   protected final String writeToken;
   protected final TaskContextSupplier taskContextSupplier;
 
-  public HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String partitionPath,
-                           String fileId, HoodieTable<T, I, K, O> hoodieTable, TaskContextSupplier taskContextSupplier) {
-    this(config, instantTime, partitionPath, fileId, hoodieTable,
-        getWriterSchemaIncludingAndExcludingMetadataPair(config), taskContextSupplier);
-  }
+  /**
+   *
+   * @param config the write config
+   * @param instantTime the instance time
+   * @param partitionPath the partition path
+   * @param fileId the file id
+   * @param hoodieTable the hoodie table
+   * @param schemaOption the option schema specified for HoodieBootstrapHandle
+   * @param taskContextSupplier
+   */
+  protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String partitionPath,
+                           String fileId, HoodieTable<T, I, K, O> hoodieTable,
+                           Option<Schema> schemaOption,
+                           TaskContextSupplier taskContextSupplier) {
 
-  protected HoodieWriteHandle(HoodieWriteConfig config, String instantTime, String partitionPath, String fileId,
-                              HoodieTable<T, I, K, O> hoodieTable, Pair<Schema, Schema> writerSchemaIncludingAndExcludingMetadataPair,
-                              TaskContextSupplier taskContextSupplier) {
     super(config, instantTime, hoodieTable);
     this.partitionPath = partitionPath;
     this.fileId = fileId;
-    this.writerSchema = writerSchemaIncludingAndExcludingMetadataPair.getKey();
-    this.writerSchemaWithMetafields = writerSchemaIncludingAndExcludingMetadataPair.getValue();
+    // Here the hoodieTable#config may not equal to the config field, we use the 'config' from the
+    // construct method
+    this.tableSchema = schemaOption.orElseGet(() -> hoodieTable.getTableSchema(config, false));

Review comment:
       trying to confirm my understanding. Only for bootstrap code path, we need the table schema to be initialized w/ incoming Schema and can't rely on hoodieTable.getTableSchema(). If not, we should not be looking at schemaOption only to set tableSchema(in every other call paths). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org