You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "pramodbiligiri (via GitHub)" <gi...@apache.org> on 2023/02/06 07:46:59 UTC

[GitHub] [hudi] pramodbiligiri opened a new pull request, #7864: [HUDI-5688] Small workaround that can prevent the NPE

pramodbiligiri opened a new pull request, #7864:
URL: https://github.com/apache/hudi/pull/7864

   Relates to: https://issues.apache.org/jira/browse/HUDI-5688
   
   A small workaround change that shows how an empty StructType() can make the NPE go away. Don't consider this as a fix yet, but just a validation of the bug report.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on PR #7864:
URL: https://github.com/apache/hudi/pull/7864#issuecomment-1470879470

   Closing this in favor of https://github.com/apache/hudi/pull/8174
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7864: [HUDI-5688] Small workaround that can prevent the NPE

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7864:
URL: https://github.com/apache/hudi/pull/7864#issuecomment-1418673588

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "530f6035723351a95a74dd23645dac65fcb03055",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14958",
       "triggerID" : "530f6035723351a95a74dd23645dac65fcb03055",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 530f6035723351a95a74dd23645dac65fcb03055 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14958) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7864: [HUDI-5688] Small workaround that can prevent the NPE

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7864:
URL: https://github.com/apache/hudi/pull/7864#issuecomment-1418664110

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "530f6035723351a95a74dd23645dac65fcb03055",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "530f6035723351a95a74dd23645dac65fcb03055",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 530f6035723351a95a74dd23645dac65fcb03055 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7864:
URL: https://github.com/apache/hudi/pull/7864#discussion_r1099719787


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -241,7 +241,12 @@ object DefaultSource {
     }
 
     if (metaClient.getCommitsTimeline.filterCompletedInstants.countInstants() == 0) {
-      new EmptyRelation(sqlContext, resolveSchema(metaClient, parameters, Some(schema)))
+      val structType = resolveSchema(metaClient, parameters, Some(schema))

Review Comment:
   can you help me understand why we might get null schema while reading a hudi table ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "pramodbiligiri (via GitHub)" <gi...@apache.org>.
pramodbiligiri commented on code in PR #7864:
URL: https://github.com/apache/hudi/pull/7864#discussion_r1105298405


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -241,7 +241,12 @@ object DefaultSource {
     }
 
     if (metaClient.getCommitsTimeline.filterCompletedInstants.countInstants() == 0) {
-      new EmptyRelation(sqlContext, resolveSchema(metaClient, parameters, Some(schema)))
+      val structType = resolveSchema(metaClient, parameters, Some(schema))

Review Comment:
   Issue has been noted as a valid bug in the JIRA: https://issues.apache.org/jira/browse/HUDI-5688?focusedCommentId=17688209&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17688209



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "pramodbiligiri (via GitHub)" <gi...@apache.org>.
pramodbiligiri commented on code in PR #7864:
URL: https://github.com/apache/hudi/pull/7864#discussion_r1099939380


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -241,7 +241,12 @@ object DefaultSource {
     }
 
     if (metaClient.getCommitsTimeline.filterCompletedInstants.countInstants() == 0) {
-      new EmptyRelation(sqlContext, resolveSchema(metaClient, parameters, Some(schema)))
+      val structType = resolveSchema(metaClient, parameters, Some(schema))

Review Comment:
   Perhaps this bit from my JIRA ticket can help? "If there are no completed instants in the table, and there is no user defined schema for it as well (as represented by the userSpecifiedSchema field in DataSource.scala), then the EmptyRelation returned by DefaultSource.createRelation sets schema of the EmptyRelation to null. This breaks the contract of Spark's BaseRelation, where the schema is a StructType but is not expected to be null."
   
   Edit: If you see the 3rd and 4th screenshots in the JIRA (https://issues.apache.org/jira/browse/HUDI-5688) you'll see that TableSchemaResolver.getTableAvroSchema() throws an Exception. I didn't debug beyond that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan closed pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema
URL: https://github.com/apache/hudi/pull/7864


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "pramodbiligiri (via GitHub)" <gi...@apache.org>.
pramodbiligiri commented on code in PR #7864:
URL: https://github.com/apache/hudi/pull/7864#discussion_r1099736301


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -241,7 +241,12 @@ object DefaultSource {
     }
 
     if (metaClient.getCommitsTimeline.filterCompletedInstants.countInstants() == 0) {
-      new EmptyRelation(sqlContext, resolveSchema(metaClient, parameters, Some(schema)))
+      val structType = resolveSchema(metaClient, parameters, Some(schema))

Review Comment:
   Actually that part I'm not aware of. There is a database I have where this behaviour can be reproduced. I'll see if I can share that table after deanonymze or something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "pramodbiligiri (via GitHub)" <gi...@apache.org>.
pramodbiligiri commented on code in PR #7864:
URL: https://github.com/apache/hudi/pull/7864#discussion_r1099939380


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -241,7 +241,12 @@ object DefaultSource {
     }
 
     if (metaClient.getCommitsTimeline.filterCompletedInstants.countInstants() == 0) {
-      new EmptyRelation(sqlContext, resolveSchema(metaClient, parameters, Some(schema)))
+      val structType = resolveSchema(metaClient, parameters, Some(schema))

Review Comment:
   Perhaps this bit from my JIRA ticket can help? "If there are no completed instants in the table, and there is no user defined schema for it as well (as represented by the userSpecifiedSchema field in DataSource.scala), then the EmptyRelation returned by DefaultSource.createRelation sets schema of the EmptyRelation to null. This breaks the contract of Spark's BaseRelation, where the schema is a StructType but is not expected to be null."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

Posted by "pramodbiligiri (via GitHub)" <gi...@apache.org>.
pramodbiligiri commented on code in PR #7864:
URL: https://github.com/apache/hudi/pull/7864#discussion_r1100984391


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala:
##########
@@ -241,7 +241,12 @@ object DefaultSource {
     }
 
     if (metaClient.getCommitsTimeline.filterCompletedInstants.countInstants() == 0) {
-      new EmptyRelation(sqlContext, resolveSchema(metaClient, parameters, Some(schema)))
+      val structType = resolveSchema(metaClient, parameters, Some(schema))

Review Comment:
   Since this seems to happen on an empty table with no completed commits, shall we keep this as a valid Hudi issue then? (The scope of my current task was only to validate if it's a valid issue). I can imagine situations like this where people will be querying a table before it is populated. I think it ought to return empty rows and not throw an NPE. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7864: [HUDI-5688] Small workaround that can prevent the NPE

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7864:
URL: https://github.com/apache/hudi/pull/7864#issuecomment-1419213970

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "530f6035723351a95a74dd23645dac65fcb03055",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14958",
       "triggerID" : "530f6035723351a95a74dd23645dac65fcb03055",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 530f6035723351a95a74dd23645dac65fcb03055 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14958) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org