You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Shivangi (Jira)" <ji...@apache.org> on 2021/09/09 21:16:00 UTC
[jira] [Updated] (HIVE-25510) Incorrect lineage for compare
expressions in select statements
[ https://issues.apache.org/jira/browse/HIVE-25510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shivangi updated HIVE-25510:
----------------------------
Description:
Incorrect lineage is generated for the queries where compare expressions are present in select statements. For example:
*`Case-when` in select statement:*
Query:
{code:java}
select place, (case when city == "aa" then id else 0 end)/id from t1;
{code}
Corresponding Lineage:
{code:java}
{
"edges": [
{
"sources": [
2
],
"targets": [
0
],
"edgeType": "PROJECTION"
},
{
"sources": [
3,
4
],
"targets": [
1
],
"expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
"edgeType": "PROJECTION"
}
],
"vertices": [
{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "place"
},
{
"id": 1,
"vertexType": "COLUMN",
"vertexId": "_c1"
},
{
"id": 2,
"vertexType": "COLUMN",
"vertexId": "default.t1.place"
},
{
"id": 3,
"vertexType": "COLUMN",
"vertexId": "default.t1.city"
},
{
"id": 4,
"vertexType": "COLUMN",
"vertexId": "default.t1.id"
}
]
}
{code}
Expected Lineage:
{code:java}
{
"edges": [
{
"sources": [
2
],
"targets": [
0
],
"edgeType": "PROJECTION"
},
{
"sources": [
3
],
"targets": [
1
],
"expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
"edgeType": "PROJECTION"
},
{
"sources": [
4
],
"targets": [
1
],
"expression": "CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END",
"edgeType": "PREDICATE"
}
],
"vertices": [
{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "place"
},
{
"id": 1,
"vertexType": "COLUMN",
"vertexId": "_c1"
},
{
"id": 2,
"vertexType": "COLUMN",
"vertexId": "default.t1.place"
},
{
"id": 3,
"vertexType": "COLUMN",
"vertexId": "default.t1.id"
},
{
"id": 4,
"vertexType": "COLUMN",
"vertexId": "default.t1.city"
}
]
}
{code}
*`IF` statement in select statement:*
Query:
{code:java}
select IF(city='aa',place,'FALSE') from t1;
{code}
Corresponding lineage:
{code:java}
{
"edges": [
{
"sources": [
1,
2
],
"targets": [
0
],
"expression": "if((UDFToString(t1.city) = 'aa'), t1.place, 'FALSE')",
"edgeType": "PROJECTION"
}
],
"vertices": [
{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "_c0"
},
{
"id": 1,
"vertexType": "COLUMN",
"vertexId": "default.t1.city"
},
{
"id": 2,
"vertexType": "COLUMN",
"vertexId": "default.t1.place"
}
]
}{code}
Expected Lineage:
Projection edge for target `vertex 0` should have only `vertex 2` as source and there should be one predicate edge as well, where source would be `vertex 1` and target `vertex 0`.
was:
Incorrect lineage is generated for the queries where compare expressions are present in select statements. For example:
*`Case-when` in select statement:*
Query:
{code:java}
select place, (case when city == "aa" then id else 0 end)/id from t1;
{code}
Corresponding Lineage:
{code:java}
{
"edges": [
{
"sources": [
2
],
"targets": [
0
],
"edgeType": "PROJECTION"
},
{
"sources": [
3,
4
],
"targets": [
1
],
"expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
"edgeType": "PROJECTION"
}
],
"vertices": [
{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "place"
},
{
"id": 1,
"vertexType": "COLUMN",
"vertexId": "_c1"
},
{
"id": 2,
"vertexType": "COLUMN",
"vertexId": "default.t1.place"
},
{
"id": 3,
"vertexType": "COLUMN",
"vertexId": "default.t1.city"
},
{
"id": 4,
"vertexType": "COLUMN",
"vertexId": "default.t1.id"
}
]
}
{code}
Expected Lineage:
{code:java}
{
"edges": [
{
"sources": [
2
],
"targets": [
0
],
"edgeType": "PROJECTION"
},
{
"sources": [
3
],
"targets": [
1
],
"expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
"edgeType": "PROJECTION"
},
{
"sources": [
4
],
"targets": [
1
],
"expression": "CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END",
"edgeType": "PREDICATE"
}
],
"vertices": [
{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "place"
},
{
"id": 1,
"vertexType": "COLUMN",
"vertexId": "_c1"
},
{
"id": 2,
"vertexType": "COLUMN",
"vertexId": "default.t1.place"
},
{
"id": 3,
"vertexType": "COLUMN",
"vertexId": "default.t1.id"
},
{
"id": 4,
"vertexType": "COLUMN",
"vertexId": "default.t1.city"
}
]
}
{code}
*`IF` statement in select statement:*
Query:
{code:java}
select IF(city='aa',place,'FALSE') from t1;
{code}
Corresponding lineage:
{code:java}
{
"edges": [
{
"sources": [
1,
2
],
"targets": [
0
],
"expression": "if((UDFToString(t1.city) = 'aa'), t1.place, 'FALSE')",
"edgeType": "PROJECTION"
}
],
"vertices": [
{
"id": 0,
"vertexType": "COLUMN",
"vertexId": "_c0"
},
{
"id": 1,
"vertexType": "COLUMN",
"vertexId": "default.t1.city"
},
{
"id": 2,
"vertexType": "COLUMN",
"vertexId": "default.t1.place"
}
]
}{code}
Expected Lineage:
Projection edge for target `vertex 0` should have only `vertex 2` as source and there should be one predicate edge as well, where source would be `vertex 1` and target `vertex 0`.
> Incorrect lineage for compare expressions in select statements
> --------------------------------------------------------------
>
> Key: HIVE-25510
> URL: https://issues.apache.org/jira/browse/HIVE-25510
> Project: Hive
> Issue Type: Bug
> Components: lineage
> Reporter: Shivangi
> Assignee: Shivangi
> Priority: Major
>
> Incorrect lineage is generated for the queries where compare expressions are present in select statements. For example:
> *`Case-when` in select statement:*
> Query:
> {code:java}
> select place, (case when city == "aa" then id else 0 end)/id from t1;
> {code}
> Corresponding Lineage:
> {code:java}
> {
> "edges": [
> {
> "sources": [
> 2
> ],
> "targets": [
> 0
> ],
> "edgeType": "PROJECTION"
> },
> {
> "sources": [
> 3,
> 4
> ],
> "targets": [
> 1
> ],
> "expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
> "edgeType": "PROJECTION"
> }
> ],
> "vertices": [
> {
> "id": 0,
> "vertexType": "COLUMN",
> "vertexId": "place"
> },
> {
> "id": 1,
> "vertexType": "COLUMN",
> "vertexId": "_c1"
> },
> {
> "id": 2,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.place"
> },
> {
> "id": 3,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.city"
> },
> {
> "id": 4,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.id"
> }
> ]
> }
> {code}
> Expected Lineage:
> {code:java}
> {
> "edges": [
> {
> "sources": [
> 2
> ],
> "targets": [
> 0
> ],
> "edgeType": "PROJECTION"
> },
> {
> "sources": [
> 3
> ],
> "targets": [
> 1
> ],
> "expression": "(UDFToDouble(CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END) / UDFToDouble(t1.id))",
> "edgeType": "PROJECTION"
> },
> {
> "sources": [
> 4
> ],
> "targets": [
> 1
> ],
> "expression": "CASE WHEN ((UDFToString(t1.city) = 'aa')) THEN (t1.id) ELSE (0) END",
> "edgeType": "PREDICATE"
> }
> ],
> "vertices": [
> {
> "id": 0,
> "vertexType": "COLUMN",
> "vertexId": "place"
> },
> {
> "id": 1,
> "vertexType": "COLUMN",
> "vertexId": "_c1"
> },
> {
> "id": 2,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.place"
> },
> {
> "id": 3,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.id"
> },
> {
> "id": 4,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.city"
> }
> ]
> }
> {code}
>
> *`IF` statement in select statement:*
> Query:
> {code:java}
> select IF(city='aa',place,'FALSE') from t1;
> {code}
> Corresponding lineage:
> {code:java}
> {
> "edges": [
> {
> "sources": [
> 1,
> 2
> ],
> "targets": [
> 0
> ],
> "expression": "if((UDFToString(t1.city) = 'aa'), t1.place, 'FALSE')",
> "edgeType": "PROJECTION"
> }
> ],
> "vertices": [
> {
> "id": 0,
> "vertexType": "COLUMN",
> "vertexId": "_c0"
> },
> {
> "id": 1,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.city"
> },
> {
> "id": 2,
> "vertexType": "COLUMN",
> "vertexId": "default.t1.place"
> }
> ]
> }{code}
> Expected Lineage:
> Projection edge for target `vertex 0` should have only `vertex 2` as source and there should be one predicate edge as well, where source would be `vertex 1` and target `vertex 0`.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)