You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "William Guo (JIRA)" <ji...@apache.org> on 2019/07/19 08:37:00 UTC
[jira] [Assigned] (GRIFFIN-266) [Service] Measure's rules are not always properly sorted

     [ https://issues.apache.org/jira/browse/GRIFFIN-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Guo reassigned GRIFFIN-266:
-----------------------------------

    Assignee: Kevin Yao

> [Service] Measure's rules are not always properly sorted
> --------------------------------------------------------
>
>                 Key: GRIFFIN-266
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-266
>             Project: Griffin
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Nevena Veljkovic
>            Assignee: Kevin Yao
>            Priority: Major
>             Fix For: 0.6.0
>
>
> If measure has more than one rule, which is common practice for dsl.type spark-sql, it could happen that measure's rules are not sorted correctly which results in job failing.
> Example:
> GET measure by id returns rules sorted in this order: 3005, 3006 and then 3004 (it should be 3004, 3005, 3006)
> {code:java}
> {
>         "id": 3005,
>         "rule": "SELECT count(*) as incomplete FROM source WHERE (node_metrics_pk IS NULL) OR (node_master_fk IS NULL) OR (location_id IS NULL) OR (freq_band IS NULL) OR (ts IS NULL) ",
>         "dsl.type": "spark-sql",
>         "dq.type": null,
>         "out.dataframe.name": "incomplete_count",
>         "out": [
>             \{
>                 "type": "record",
>                 "name": "incomplete_count"
>             },
>             \{
>                 "type": "metric",
>                 "name": "incomplete_count"
>             }
>         ]
>     },
>     \{
>         "id": 3006,
>         "rule": "SELECT (total - incomplete) AS complete FROM total_count LEFT JOIN incomplete_count",
>         "dsl.type": "spark-sql",
>         "dq.type": null,
>         "out.dataframe.name": "complete_count",
>         "out": [
>             {
>                 "type": "metric",
>                 "name": "complete_count"
>             }
>         ]
>     },
>     \{
>         "id": 3004,
>         "rule": "SELECT COUNT(*) AS total FROM source",
>         "dsl.type": "spark-sql",
>         "dq.type": null,
>         "out.dataframe.name": "total_count",
>         "out": [
>             {
>                 "type": "record",
>                 "name": "total_count"
>             },
>             \{
>                 "type": "metric",
>                 "name": "total_count"
>             }
>         ]
>     }
> {code}
>  
> Griffin job fails with error:
> {code:java}
> 19/07/11 11:00:31 ERROR transform.SparkSqlTransformStep: run spark sql [ SELECT (total - incomplete) AS complete FROM total_count LEFT JOIN incomplete_count ] error: Table or view not found: total_count; line 1 pos 45
> org.apache.spark.sql.AnalysisException: Table or view not found: total_count
> {code}
> As we see execution of rule 3005 fails because rule 3004 is not executed yet (due to incorrect sorting).
> Measure's entity EvaluateRule.java does not have a sorting:
>  [https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/measure/entity/EvaluateRule.java#L32-L38]
> According to Postgresql documentation: [https://www.postgresql.org/docs/9.3/sql-select.html]
>  If the ORDER BY clause is specified, the returned rows are sorted in the specified order.
>  If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
> Proposed solution here is to set sorting in EvaluateRule.java.
>   



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)