You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@griffin.apache.org by "William Guo (JIRA)" <ji...@apache.org> on 2019/07/19 08:37:00 UTC
[jira] [Assigned] (GRIFFIN-266) [Service] Measure's rules are not
always properly sorted
[ https://issues.apache.org/jira/browse/GRIFFIN-266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
William Guo reassigned GRIFFIN-266:
-----------------------------------
Assignee: Kevin Yao
> [Service] Measure's rules are not always properly sorted
> --------------------------------------------------------
>
> Key: GRIFFIN-266
> URL: https://issues.apache.org/jira/browse/GRIFFIN-266
> Project: Griffin
> Issue Type: Bug
> Affects Versions: 0.5.0
> Reporter: Nevena Veljkovic
> Assignee: Kevin Yao
> Priority: Major
> Fix For: 0.6.0
>
>
> If measure has more than one rule, which is common practice for dsl.type spark-sql, it could happen that measure's rules are not sorted correctly which results in job failing.
> Example:
> GET measure by id returns rules sorted in this order: 3005, 3006 and then 3004 (it should be 3004, 3005, 3006)
> {code:java}
> {
> "id": 3005,
> "rule": "SELECT count(*) as incomplete FROM source WHERE (node_metrics_pk IS NULL) OR (node_master_fk IS NULL) OR (location_id IS NULL) OR (freq_band IS NULL) OR (ts IS NULL) ",
> "dsl.type": "spark-sql",
> "dq.type": null,
> "out.dataframe.name": "incomplete_count",
> "out": [
> \{
> "type": "record",
> "name": "incomplete_count"
> },
> \{
> "type": "metric",
> "name": "incomplete_count"
> }
> ]
> },
> \{
> "id": 3006,
> "rule": "SELECT (total - incomplete) AS complete FROM total_count LEFT JOIN incomplete_count",
> "dsl.type": "spark-sql",
> "dq.type": null,
> "out.dataframe.name": "complete_count",
> "out": [
> {
> "type": "metric",
> "name": "complete_count"
> }
> ]
> },
> \{
> "id": 3004,
> "rule": "SELECT COUNT(*) AS total FROM source",
> "dsl.type": "spark-sql",
> "dq.type": null,
> "out.dataframe.name": "total_count",
> "out": [
> {
> "type": "record",
> "name": "total_count"
> },
> \{
> "type": "metric",
> "name": "total_count"
> }
> ]
> }
> {code}
>
> Griffin job fails with error:
> {code:java}
> 19/07/11 11:00:31 ERROR transform.SparkSqlTransformStep: run spark sql [ SELECT (total - incomplete) AS complete FROM total_count LEFT JOIN incomplete_count ] error: Table or view not found: total_count; line 1 pos 45
> org.apache.spark.sql.AnalysisException: Table or view not found: total_count
> {code}
> As we see execution of rule 3005 fails because rule 3004 is not executed yet (due to incorrect sorting).
> Measure's entity EvaluateRule.java does not have a sorting:
> [https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/measure/entity/EvaluateRule.java#L32-L38]
> According to Postgresql documentation: [https://www.postgresql.org/docs/9.3/sql-select.html]
> If the ORDER BY clause is specified, the returned rows are sorted in the specified order.
> If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
> Proposed solution here is to set sorting in EvaluateRule.java.
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)