You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/01/06 09:38:00 UTC

[jira] [Work logged] (GRIFFIN-316) Spark runtime exception cannot be caught while running a dq application

     [ https://issues.apache.org/jira/browse/GRIFFIN-316?focusedWorklogId=366542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-366542 ]

ASF GitHub Bot logged work on GRIFFIN-316:
------------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Jan/20 09:37
            Start Date: 06/Jan/20 09:37
    Worklog Time Spent: 10m 
      Work Description: chitralverma commented on pull request #562: [GRIFFIN-316] Fix job exception handling
URL: https://github.com/apache/griffin/pull/562#discussion_r363218555
 
 

 ##########
 File path: measure/src/main/scala/org/apache/griffin/measure/job/DQJob.scala
 ##########
 @@ -25,8 +27,16 @@ case class DQJob(dqSteps: Seq[DQStep]) extends Serializable {
   /**
    * @return execution success
    */
-  def execute(context: DQContext): Boolean = {
-    dqSteps.forall(dqStep => dqStep.execute(context))
+  def execute(context: DQContext): Try[Boolean] = {
+    val tmp = dqSteps.map(dqStep => dqStep.execute(context))
 
 Review comment:
   mapping the elements of `Seq[_]` is not lazy. If there are issues while `.execute`, they will not be handled.
   
   Also, a different variable name can be used in place of `tmp`.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 366542)
    Time Spent: 40m  (was: 0.5h)

> Spark runtime exception cannot be caught while running a dq application
> -----------------------------------------------------------------------
>
>                 Key: GRIFFIN-316
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-316
>             Project: Griffin
>          Issue Type: Bug
>    Affects Versions: 0.4.0, 0.5.0, 0.6.0
>            Reporter: Yu LIU
>            Priority: Major
>             Fix For: 0.6.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> If we put an invalid rule for a batch job (as it happens quite often given that the rules are evaluated at runtime via spark sql), the exception thrown by SparkSession has not been caught and transferred properly to user via "Try" instance, but the job actually succeed with a "Success" returned.
> The reason is that we are only wrapping the returned Boolean result by applying "Try" at the most outside level for DQApp.run, so the exception thrown deeper through the call stack cannot be caught.
>  
> Here is an example config file to reproduce the issue:
> {noformat}
> {
>   "name": "prof_batch",
>   "process.type": "batch",
>   "timestamp": 123456,
>   "data.sources": [
>     {
>       "name": "source",
>       "connectors": [
>         {
>           "type": "avro",
>           "version": "1.7",
>           "dataframe.name" : "this_table",
>           "config": {
>             "file.name": "src/test/resources/users_info_src.avro"
>           },
>           "pre.proc": [
>             {
>               "dsl.type": "spark-sql",
>               "rule": "select * from this_table where user_id < 10014"
>             }
>           ]
>         }
>       ]
>     }
>   ],
>   "evaluate.rule": {
>     "rules": [
>       {
>         "dsl.type": "griffin-dsl",
>         "dq.type": "profiling",
>         "out.dataframe.name": "prof",
>         "rule": "xxx",
>         "out":[
>           {
>             "type": "metric",
>             "name": "prof",
>             "flatten": "array"
>           }
>         ]
>       }
>     ]
>   },
>   "sinks": ["CONSOLE"]
> }{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)