You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/13 16:54:13 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

nsivabalan opened a new pull request, #5850:
URL: https://github.com/apache/hudi/pull/5850

   ## What is the purpose of the pull request
   
   Row writer with multi-writer enabled has a NullPointerException. Fixing the same in this patch. 
   
   ## Brief change log
   
   - Row writer flow was not calling preWrite which is expected to be called for multi-writers. So, fixing that in this patch. 
   
   ## Verify this pull request
   
   
   This change added tests and can be verified as follows:
   
   - Added a simple test to TestHoodieSparkSqlWriter to test the bulk insert row writer w/ concurrency control configs.
   - Verified multi-writer test with integ test framework locally w/ both writers ingesting via bulk_inserts. 
   
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan merged pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
xushiyan merged PR #5850:
URL: https://github.com/apache/hudi/pull/5850


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #5850:
URL: https://github.com/apache/hudi/pull/5850#discussion_r927161235


##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java:
##########
@@ -68,6 +68,7 @@ public DataSourceInternalWriterHelper(String instantTime, HoodieWriteConfig writ
     this.metaClient = HoodieTableMetaClient.builder().setConf(configuration).setBasePath(writeConfig.getBasePath()).build();
     this.metaClient.validateTableProperties(writeConfig.getProps());
     this.hoodieTable = HoodieSparkTable.create(writeConfig, new HoodieSparkEngineContext(new JavaSparkContext(sparkSession.sparkContext())), metaClient);
+    writeClient.preWrite(instantTime, WriteOperationType.BULK_INSERT, metaClient);

Review Comment:
   https://issues.apache.org/jira/browse/HUDI-4444 filed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5850:
URL: https://github.com/apache/hudi/pull/5850#issuecomment-1154162718

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe1e2fefa4311970df9a9caa183333fa76f83f12",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "fe1e2fefa4311970df9a9caa183333fa76f83f12",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe1e2fefa4311970df9a9caa183333fa76f83f12 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5850:
URL: https://github.com/apache/hudi/pull/5850#issuecomment-1154201264

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe1e2fefa4311970df9a9caa183333fa76f83f12",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9267",
       "triggerID" : "fe1e2fefa4311970df9a9caa183333fa76f83f12",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe1e2fefa4311970df9a9caa183333fa76f83f12 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9267) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5850:
URL: https://github.com/apache/hudi/pull/5850#issuecomment-1154205016

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "fe1e2fefa4311970df9a9caa183333fa76f83f12",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9267",
       "triggerID" : "fe1e2fefa4311970df9a9caa183333fa76f83f12",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * fe1e2fefa4311970df9a9caa183333fa76f83f12 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9267) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
xushiyan commented on code in PR #5850:
URL: https://github.com/apache/hudi/pull/5850#discussion_r927156418


##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java:
##########
@@ -68,6 +68,7 @@ public DataSourceInternalWriterHelper(String instantTime, HoodieWriteConfig writ
     this.metaClient = HoodieTableMetaClient.builder().setConf(configuration).setBasePath(writeConfig.getBasePath()).build();
     this.metaClient.validateTableProperties(writeConfig.getProps());
     this.hoodieTable = HoodieSparkTable.create(writeConfig, new HoodieSparkEngineContext(new JavaSparkContext(sparkSession.sparkContext())), metaClient);
+    writeClient.preWrite(instantTime, WriteOperationType.BULK_INSERT, metaClient);

Review Comment:
   it is not ideal but sort of existing pattern. `writeClient.startCommitWithTime` is already writing files there. I'd say it's ok to go ahead with this and file a refactoring task



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
xushiyan commented on PR #5850:
URL: https://github.com/apache/hudi/pull/5850#issuecomment-1192010789

   https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=9267&view=results
   CI passed. landing this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5850:
URL: https://github.com/apache/hudi/pull/5850#discussion_r896351920


##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java:
##########
@@ -68,6 +68,7 @@ public DataSourceInternalWriterHelper(String instantTime, HoodieWriteConfig writ
     this.metaClient = HoodieTableMetaClient.builder().setConf(configuration).setBasePath(writeConfig.getBasePath()).build();
     this.metaClient.validateTableProperties(writeConfig.getProps());
     this.hoodieTable = HoodieSparkTable.create(writeConfig, new HoodieSparkEngineContext(new JavaSparkContext(sparkSession.sparkContext())), metaClient);
+    writeClient.preWrite(instantTime, WriteOperationType.BULK_INSERT, metaClient);

Review Comment:
   The fix makes sense but the not a good idea to put a write logic in the constructor, WDYT ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #5850: [HUDI-4204] Fixing NPE with row writer path and with OCC

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #5850:
URL: https://github.com/apache/hudi/pull/5850#discussion_r905767987


##########
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/internal/DataSourceInternalWriterHelper.java:
##########
@@ -68,6 +68,7 @@ public DataSourceInternalWriterHelper(String instantTime, HoodieWriteConfig writ
     this.metaClient = HoodieTableMetaClient.builder().setConf(configuration).setBasePath(writeConfig.getBasePath()).build();
     this.metaClient.validateTableProperties(writeConfig.getProps());
     this.hoodieTable = HoodieSparkTable.create(writeConfig, new HoodieSparkEngineContext(new JavaSparkContext(sparkSession.sparkContext())), metaClient);
+    writeClient.preWrite(instantTime, WriteOperationType.BULK_INSERT, metaClient);

Review Comment:
   yeah. makes sense. Only other option I see is to add it to createInflightCommit(). Do you suggest to do that or should we introduce a new method and explicitly call before calling createInflightCommit? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org