You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/15 01:33:55 UTC

[GitHub] [hudi] yihua opened a new pull request, #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

yihua opened a new pull request, #6676:
URL: https://github.com/apache/hudi/pull/6676

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1260286424

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799",
       "triggerID" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11801",
       "triggerID" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * edf92e5ee1ef5ee9df4db8649e94277875fa00db Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11801) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] weimingdiit commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by "weimingdiit (via GitHub)" <gi...@apache.org>.
weimingdiit commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1517739451

   @yihua @nsivabalan  Maybe orc schema have the same problem in the `getBootstrapSourceSchemaOrc` method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1248919461

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6676:
URL: https://github.com/apache/hudi/pull/6676#discussion_r981669014


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java:
##########
@@ -72,26 +68,15 @@ protected Schema getBootstrapSourceSchema(HoodieEngineContext context, List<Pair
   }
 
   private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig writeConfig, HoodieEngineContext context, Path filePath) {
-    Configuration hadoopConf = context.getHadoopConf().get();
-    MessageType parquetSchema = new ParquetUtils().readSchema(hadoopConf, filePath);
-
-    hadoopConf.set(
-        SQLConf.PARQUET_BINARY_AS_STRING().key(),
-        SQLConf.PARQUET_BINARY_AS_STRING().defaultValueString());
-    hadoopConf.set(
-        SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(),
-        SQLConf.PARQUET_INT96_AS_TIMESTAMP().defaultValueString());
-    hadoopConf.set(
-        SQLConf.CASE_SENSITIVE().key(),
-        SQLConf.CASE_SENSITIVE().defaultValueString());
-    ParquetToSparkSchemaConverter converter = new ParquetToSparkSchemaConverter(hadoopConf);
-
-    StructType sparkSchema = converter.convert(parquetSchema);
+    StructType parquetSchema = ((HoodieSparkEngineContext) context).getSqlContext().read()
+        .option("basePath", writeConfig.getBootstrapSourceBasePath())
+        .parquet(filePath.toString())
+        .schema();
     String tableName = HoodieAvroUtils.sanitizeName(writeConfig.getTableName());
     String structName = tableName + "_record";
     String recordNamespace = "hoodie." + tableName;
 
-    return AvroConversionUtils.convertStructTypeToAvroSchema(sparkSchema, structName, recordNamespace);
+    return AvroConversionUtils.convertStructTypeToAvroSchema(parquetSchema, structName, recordNamespace);

Review Comment:
   Good callout!  I was debugging the schema resolution in `HoodieBootstrapRelation` and didn't put the changes up for review.  Now, I verified that we don't need to append the partition column in `HoodieBootstrapRelation` as the partition column is already available from the table/read schema.



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java:
##########
@@ -72,26 +68,15 @@ protected Schema getBootstrapSourceSchema(HoodieEngineContext context, List<Pair
   }
 
   private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig writeConfig, HoodieEngineContext context, Path filePath) {
-    Configuration hadoopConf = context.getHadoopConf().get();
-    MessageType parquetSchema = new ParquetUtils().readSchema(hadoopConf, filePath);
-
-    hadoopConf.set(
-        SQLConf.PARQUET_BINARY_AS_STRING().key(),
-        SQLConf.PARQUET_BINARY_AS_STRING().defaultValueString());
-    hadoopConf.set(
-        SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(),

Review Comment:
   Given that we're using the SQL context to read the schema of the parquet table, all the SQLConf defaults are automatically added, so these config settings are not needed anymore.
   
   In the current bootstrap operation, we always assume the partition column is String typed, so we have to turn off the type inference of the partition column to be consistent with the existing behavior for now.  I created HUDI-4932 to support the config knob, so that other types of partition column can be supported in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1248840961

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1260042495

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799",
       "triggerID" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11801",
       "triggerID" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b79e74df026d34bb97e28ae7d8092442f1432fb0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799) 
   * edf92e5ee1ef5ee9df4db8649e94277875fa00db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11801) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1248813922

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope merged pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
codope merged PR #6676:
URL: https://github.com/apache/hudi/pull/6676


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1259970305

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799",
       "triggerID" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400) 
   * b79e74df026d34bb97e28ae7d8092442f1432fb0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799) 
   * edf92e5ee1ef5ee9df4db8649e94277875fa00db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1259975545

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799",
       "triggerID" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "edf92e5ee1ef5ee9df4db8649e94277875fa00db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * b79e74df026d34bb97e28ae7d8092442f1432fb0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11799) 
   * edf92e5ee1ef5ee9df4db8649e94277875fa00db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6676:
URL: https://github.com/apache/hudi/pull/6676#issuecomment-1259964983

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400",
       "triggerID" : "fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "b79e74df026d34bb97e28ae7d8092442f1432fb0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fa203bff2e2bb9fc27e50f0b0c2613770bfa5dc6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11400) 
   * b79e74df026d34bb97e28ae7d8092442f1432fb0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on a diff in pull request #6676: [HUDI-4453] Fix schema to include partition columns in bootstrap operation

Posted by GitBox <gi...@apache.org>.
codope commented on code in PR #6676:
URL: https://github.com/apache/hudi/pull/6676#discussion_r974229919


##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java:
##########
@@ -72,26 +68,15 @@ protected Schema getBootstrapSourceSchema(HoodieEngineContext context, List<Pair
   }
 
   private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig writeConfig, HoodieEngineContext context, Path filePath) {
-    Configuration hadoopConf = context.getHadoopConf().get();
-    MessageType parquetSchema = new ParquetUtils().readSchema(hadoopConf, filePath);
-
-    hadoopConf.set(
-        SQLConf.PARQUET_BINARY_AS_STRING().key(),
-        SQLConf.PARQUET_BINARY_AS_STRING().defaultValueString());
-    hadoopConf.set(
-        SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(),

Review Comment:
   Should we keep these configs? Does spark...schema() infer string date values as date type?



##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/bootstrap/HoodieSparkBootstrapSchemaProvider.java:
##########
@@ -72,26 +68,15 @@ protected Schema getBootstrapSourceSchema(HoodieEngineContext context, List<Pair
   }
 
   private static Schema getBootstrapSourceSchemaParquet(HoodieWriteConfig writeConfig, HoodieEngineContext context, Path filePath) {
-    Configuration hadoopConf = context.getHadoopConf().get();
-    MessageType parquetSchema = new ParquetUtils().readSchema(hadoopConf, filePath);
-
-    hadoopConf.set(
-        SQLConf.PARQUET_BINARY_AS_STRING().key(),
-        SQLConf.PARQUET_BINARY_AS_STRING().defaultValueString());
-    hadoopConf.set(
-        SQLConf.PARQUET_INT96_AS_TIMESTAMP().key(),
-        SQLConf.PARQUET_INT96_AS_TIMESTAMP().defaultValueString());
-    hadoopConf.set(
-        SQLConf.CASE_SENSITIVE().key(),
-        SQLConf.CASE_SENSITIVE().defaultValueString());
-    ParquetToSparkSchemaConverter converter = new ParquetToSparkSchemaConverter(hadoopConf);
-
-    StructType sparkSchema = converter.convert(parquetSchema);
+    StructType parquetSchema = ((HoodieSparkEngineContext) context).getSqlContext().read()
+        .option("basePath", writeConfig.getBootstrapSourceBasePath())
+        .parquet(filePath.toString())
+        .schema();
     String tableName = HoodieAvroUtils.sanitizeName(writeConfig.getTableName());
     String structName = tableName + "_record";
     String recordNamespace = "hoodie." + tableName;
 
-    return AvroConversionUtils.convertStructTypeToAvroSchema(sparkSchema, structName, recordNamespace);
+    return AvroConversionUtils.convertStructTypeToAvroSchema(parquetSchema, structName, recordNamespace);

Review Comment:
   now that partition column will be written to the commit metadata, do we still need to append partition column in `HoodieBootstrapRelation`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org