You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/14 22:19:49 UTC

[GitHub] [incubator-hudi] afilipchik opened a new pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

afilipchik opened a new pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation
URL: https://github.com/apache/incubator-hudi/pull/1518
 
 
   ## What is the purpose of the pull request
   If schema is inferred from spark dataframe after Sql Transformation. Not sure if it is going to work for continuous mode, need advice. 
   
   ## Brief change log
   
     - Modified DeltaSync to support force schema registration and made sure new schema is registered if Transformer is used. 
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r409887222
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -460,8 +471,17 @@ private void syncHive() {
    * this constraint.
    */
   public void setupWriteClient() {
+    setupWriteClient(schemaProvider, false);
+  }
+
+  /**
+   * Note that depending on configs and source-type, schemaProvider could either be eagerly or lazily created.
+   * SchemaProvider creation is a precursor to HoodieWriteClient and AsyncCompactor creation. This method takes care of
+   * this constraint.
+   */
+  private void setupWriteClient(SchemaProvider schemaProvider, boolean forceRecreate) {
     LOG.info("Setting up Hoodie Write Client");
-    if ((null != schemaProvider) && (null == writeClient)) {
+    if (forceRecreate || (null != schemaProvider) && (null == writeClient)) {
 
 Review comment:
   wondering if we can just redo the schema registration without forcing recreating the entire client? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r409840574
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
 ##########
 @@ -296,10 +296,21 @@ private void refreshTimeline() throws IOException {
 
       // Use Transformed Row's schema if not overridden. If target schema is not specified
       // default to RowBasedSchemaProvider
-      schemaProvider = this.schemaProvider == null || this.schemaProvider.getTargetSchema() == null
-          ? transformed.map(r -> (SchemaProvider) new RowBasedSchemaProvider(r.schema())).orElse(
-          dataAndCheckpoint.getSchemaProvider())
-          : this.schemaProvider;
+      if (this.schemaProvider == null) {
+        schemaProvider =
+            transformed
+                .map(r -> (SchemaProvider) new RowBasedSchemaProvider(r.schema()))
+                .orElse(dataAndCheckpoint.getSchemaProvider());
+      } else if (this.schemaProvider.getTargetSchema() == null) {
+        schemaProvider =
 
 Review comment:
   is n't this block the same as above? why break it up into an else-if? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] afilipchik opened a new pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
afilipchik opened a new pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation
URL: https://github.com/apache/incubator-hudi/pull/1518
 
 
   ## What is the purpose of the pull request
   If schema is inferred from spark dataframe after Sql Transformation. Not sure if it is going to work for continuous mode, need advice. 
   
   ## Brief change log
   
     - Modified DeltaSync to support force schema registration and made sure new schema is registered if Transformer is used. 
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] bvaradar commented on pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
bvaradar commented on pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#issuecomment-629445679


   Going ahead and merging this change. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] afilipchik closed pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
afilipchik closed pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation
URL: https://github.com/apache/incubator-hudi/pull/1518
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] afilipchik commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
afilipchik commented on a change in pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r416381330



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -460,8 +471,17 @@ private void syncHive() {
    * this constraint.
    */
   public void setupWriteClient() {
+    setupWriteClient(schemaProvider, false);
+  }
+
+  /**
+   * Note that depending on configs and source-type, schemaProvider could either be eagerly or lazily created.
+   * SchemaProvider creation is a precursor to HoodieWriteClient and AsyncCompactor creation. This method takes care of
+   * this constraint.
+   */
+  private void setupWriteClient(SchemaProvider schemaProvider, boolean forceRecreate) {
     LOG.info("Setting up Hoodie Write Client");
-    if ((null != schemaProvider) && (null == writeClient)) {
+    if (forceRecreate || (null != schemaProvider) && (null == writeClient)) {

Review comment:
       that is a good question. @bvaradar , any suggestions? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r425626686



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -460,8 +471,17 @@ private void syncHive() {
    * this constraint.
    */
   public void setupWriteClient() {
+    setupWriteClient(schemaProvider, false);
+  }
+
+  /**
+   * Note that depending on configs and source-type, schemaProvider could either be eagerly or lazily created.
+   * SchemaProvider creation is a precursor to HoodieWriteClient and AsyncCompactor creation. This method takes care of
+   * this constraint.
+   */
+  private void setupWriteClient(SchemaProvider schemaProvider, boolean forceRecreate) {
     LOG.info("Setting up Hoodie Write Client");
-    if ((null != schemaProvider) && (null == writeClient)) {
+    if (forceRecreate || (null != schemaProvider) && (null == writeClient)) {

Review comment:
       Actually, I just realized HoodieWriteConfig schema also needs to be updated in this case. Trying to see if we can have a consistent point of creating WriteClient and SchemaProvider 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] codecov-io edited a comment on pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#issuecomment-629195380


   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=h1) Report
   > Merging [#1518](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/a64afdfd17ac974e451bceb877f3d40a9c775253&el=desc) will **increase** coverage by `0.06%`.
   > The diff coverage is `94.11%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1518/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1518      +/-   ##
   ============================================
   + Coverage     71.75%   71.81%   +0.06%     
   - Complexity     1089     1092       +3     
   ============================================
     Files           385      386       +1     
     Lines         16599    16608       +9     
     Branches       1668     1667       -1     
   ============================================
   + Hits          11910    11927      +17     
   + Misses         3962     3955       -7     
   + Partials        727      726       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `73.86% <90.90%> (+1.42%)` | `36.00 <1.00> (-1.00)` | :arrow_up: |
   | [...udi/utilities/schema/DelegatingSchemaProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9EZWxlZ2F0aW5nU2NoZW1hUHJvdmlkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | `3.00 <3.00> (?)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `67.56% <0.00%> (+10.81%)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `80.00% <0.00%> (+40.00%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=footer). Last update [a64afdf...8fe14bd](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r425603474



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -460,8 +471,17 @@ private void syncHive() {
    * this constraint.
    */
   public void setupWriteClient() {
+    setupWriteClient(schemaProvider, false);
+  }
+
+  /**
+   * Note that depending on configs and source-type, schemaProvider could either be eagerly or lazily created.
+   * SchemaProvider creation is a precursor to HoodieWriteClient and AsyncCompactor creation. This method takes care of
+   * this constraint.
+   */
+  private void setupWriteClient(SchemaProvider schemaProvider, boolean forceRecreate) {
     LOG.info("Setting up Hoodie Write Client");
-    if ((null != schemaProvider) && (null == writeClient)) {
+    if (forceRecreate || (null != schemaProvider) && (null == writeClient)) {

Review comment:
       I agree. We should just do schema re-registration. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r420192036



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -460,8 +471,17 @@ private void syncHive() {
    * this constraint.
    */
   public void setupWriteClient() {
+    setupWriteClient(schemaProvider, false);
+  }
+
+  /**
+   * Note that depending on configs and source-type, schemaProvider could either be eagerly or lazily created.
+   * SchemaProvider creation is a precursor to HoodieWriteClient and AsyncCompactor creation. This method takes care of
+   * this constraint.
+   */
+  private void setupWriteClient(SchemaProvider schemaProvider, boolean forceRecreate) {
     LOG.info("Setting up Hoodie Write Client");
-    if ((null != schemaProvider) && (null == writeClient)) {
+    if (forceRecreate || (null != schemaProvider) && (null == writeClient)) {

Review comment:
       @bvaradar ping again :) 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] codecov-io commented on pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#issuecomment-629195380


   # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=h1) Report
   > Merging [#1518](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=desc) into [master](https://codecov.io/gh/apache/incubator-hudi/commit/a64afdfd17ac974e451bceb877f3d40a9c775253&el=desc) will **increase** coverage by `0.06%`.
   > The diff coverage is `94.11%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-hudi/pull/1518/graphs/tree.svg?width=650&height=150&src=pr&token=VTTXabwbs2)](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #1518      +/-   ##
   ============================================
   + Coverage     71.75%   71.81%   +0.06%     
   - Complexity     1089     1092       +3     
   ============================================
     Files           385      386       +1     
     Lines         16599    16608       +9     
     Branches       1668     1667       -1     
   ============================================
   + Hits          11910    11927      +17     
   + Misses         3962     3955       -7     
   + Partials        727      726       -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=tree) | Coverage Δ | Complexity Δ | |
   |---|---|---|---|
   | [...apache/hudi/utilities/deltastreamer/DeltaSync.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL2RlbHRhc3RyZWFtZXIvRGVsdGFTeW5jLmphdmE=) | `73.86% <90.90%> (+1.42%)` | `36.00 <1.00> (-1.00)` | :arrow_up: |
   | [...udi/utilities/schema/DelegatingSchemaProvider.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS11dGlsaXRpZXMvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvdXRpbGl0aWVzL3NjaGVtYS9EZWxlZ2F0aW5nU2NoZW1hUHJvdmlkZXIuamF2YQ==) | `100.00% <100.00%> (ø)` | `3.00 <3.00> (?)` | |
   | [...src/main/java/org/apache/hudi/metrics/Metrics.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9NZXRyaWNzLmphdmE=) | `67.56% <0.00%> (+10.81%)` | `0.00% <0.00%> (ø%)` | |
   | [...g/apache/hudi/metrics/InMemoryMetricsReporter.java](https://codecov.io/gh/apache/incubator-hudi/pull/1518/diff?src=pr&el=tree#diff-aHVkaS1jbGllbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2h1ZGkvbWV0cmljcy9Jbk1lbW9yeU1ldHJpY3NSZXBvcnRlci5qYXZh) | `80.00% <0.00%> (+40.00%)` | `0.00% <0.00%> (ø%)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=footer). Last update [a64afdf...8fe14bd](https://codecov.io/gh/apache/incubator-hudi/pull/1518?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] vinothchandar commented on pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#issuecomment-628747799


   @bvaradar Could you review this one?  It hits close to the transformer/schemaprovider changes, which you are more familiar with 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar merged pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
bvaradar merged pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r426010971



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -296,10 +296,21 @@ private void refreshTimeline() throws IOException {
 
       // Use Transformed Row's schema if not overridden. If target schema is not specified
       // default to RowBasedSchemaProvider
-      schemaProvider = this.schemaProvider == null || this.schemaProvider.getTargetSchema() == null
-          ? transformed.map(r -> (SchemaProvider) new RowBasedSchemaProvider(r.schema())).orElse(
-          dataAndCheckpoint.getSchemaProvider())
-          : this.schemaProvider;
+      if (this.schemaProvider == null) {
+        schemaProvider =
+            transformed
+                .map(r -> (SchemaProvider) new RowBasedSchemaProvider(r.schema()))
+                .orElse(dataAndCheckpoint.getSchemaProvider());
+      } else if (this.schemaProvider.getTargetSchema() == null) {
+        schemaProvider =

Review comment:
       Fixed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1518: [HUDI-723] Register avro schema if infered from SQL transformation

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1518:
URL: https://github.com/apache/incubator-hudi/pull/1518#discussion_r425603474



##########
File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##########
@@ -460,8 +471,17 @@ private void syncHive() {
    * this constraint.
    */
   public void setupWriteClient() {
+    setupWriteClient(schemaProvider, false);
+  }
+
+  /**
+   * Note that depending on configs and source-type, schemaProvider could either be eagerly or lazily created.
+   * SchemaProvider creation is a precursor to HoodieWriteClient and AsyncCompactor creation. This method takes care of
+   * this constraint.
+   */
+  private void setupWriteClient(SchemaProvider schemaProvider, boolean forceRecreate) {
     LOG.info("Setting up Hoodie Write Client");
-    if ((null != schemaProvider) && (null == writeClient)) {
+    if (forceRecreate || (null != schemaProvider) && (null == writeClient)) {

Review comment:
       I agree. We should just do schema re-registration alone and not recreate client.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org