You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/05 02:07:52 UTC

[GitHub] [hudi] nsivabalan opened a new pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

nsivabalan opened a new pull request #4509:
URL: https://github.com/apache/hudi/pull/4509


   ## What is the purpose of the pull request
   
   When a table created via deltastreamer has only one commit which is empty, there are chances that there is not schema (depending on how schema provider is set). 
   
   In such cases, if someone tries to do incremental read from this table, the commit metadata may not have any schema and hence results in NPE.
   
   ## Brief change log
   
   Fixed Incremental relation to return empty RDD on such cases. 
   
   ## Verify this pull request
   
   - I could not reproduce this locally as I tried w/ parquet DFS and used FileBasedSchemaProvider and so schema was populated and hence incremental query return empty dataframe. Will try to poke around to validate the fix. 
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#issuecomment-1005321184


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 81384070a72540a9571cbf5c96eb1d185ce0fc90 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinishjail97 commented on a change in pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
vinishjail97 commented on a change in pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#discussion_r778806599



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##########
@@ -89,8 +89,13 @@ class IncrementalRelation(val sqlContext: SQLContext,
     } else {
       schemaResolver.getTableAvroSchemaWithoutMetadataFields()
     }
-    val dataSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableSchema)
-    StructType(skeletonSchema.fields ++ dataSchema.fields)
+    if (tableSchema == null) {

Review comment:
       `tableSchema.getType == Schema.Type.NULL` is the expected boolean expression for checking AVRO schema is null. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#issuecomment-1005346122


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4891",
       "triggerID" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 81384070a72540a9571cbf5c96eb1d185ce0fc90 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4891) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinishjail97 commented on a change in pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
vinishjail97 commented on a change in pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#discussion_r778807350



##########
File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala
##########
@@ -89,8 +89,13 @@ class IncrementalRelation(val sqlContext: SQLContext,
     } else {
       schemaResolver.getTableAvroSchemaWithoutMetadataFields()
     }
-    val dataSchema = AvroConversionUtils.convertAvroSchemaToStructType(tableSchema)
-    StructType(skeletonSchema.fields ++ dataSchema.fields)
+    if (tableSchema == null) {
+      // if there is only one commit in the table and is an empty commit without schema, return empty RDD here
+      null

Review comment:
       For some reason scala was not accepting null values to be returned, used `StructType(Nil)` and made the check for usedSchema accordingly. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#issuecomment-1005322404


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4891",
       "triggerID" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 81384070a72540a9571cbf5c96eb1d185ce0fc90 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4891) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#issuecomment-1005322404


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4891",
       "triggerID" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 81384070a72540a9571cbf5c96eb1d185ce0fc90 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4891) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
nsivabalan closed pull request #4509:
URL: https://github.com/apache/hudi/pull/4509


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4509: [HUDI-3168] Fixing null schema with empty commit in incremental relation

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4509:
URL: https://github.com/apache/hudi/pull/4509#issuecomment-1005321184


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "81384070a72540a9571cbf5c96eb1d185ce0fc90",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 81384070a72540a9571cbf5c96eb1d185ce0fc90 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org