You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Sarath Subramanian <sa...@apache.org> on 2019/12/11 23:59:34 UTC
Review Request 71902: ATLAS-3558: Improve lineage performance using
in-memory traversal
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------
Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
Bugs: ATLAS-3558
https://issues.apache.org/jira/browse/ATLAS-3558
Repository: atlas
Description
-------
Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
Diffs
-----
intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
Diff: https://reviews.apache.org/r/71902/diff/1/
Testing
-------
Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console
Thanks,
Sarath Subramanian
Re: Review Request 71902: ATLAS-3558: Improve lineage performance
using in-memory traversal
Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/#review219011
-----------------------------------------------------------
intg/src/main/java/org/apache/atlas/AtlasConfiguration.java
Lines 66 (patched)
<https://reviews.apache.org/r/71902/#comment307029>
LINEAGE_GRAPH_QUERY("atlas.use.graph.query.for.lineage", false) => LINEAGE_USING_GREMLIN("atlas.lineage.query.use.gremlin", false)
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 84 (patched)
<https://reviews.apache.org/r/71902/#comment307030>
useGraphQueryForLineage => LINEAGE_USING_GREMLIN
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 205 (patched)
<https://reviews.apache.org/r/71902/#comment307023>
To be consistent, consider replacing "UsingGraphQuery" with "V1" as suffix.
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 264 (patched)
<https://reviews.apache.org/r/71902/#comment307034>
Consider returning "Collection<AtlasEdge>" from this method, as the caller uses only the value of the returned map.
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 280 (patched)
<https://reviews.apache.org/r/71902/#comment307025>
How is 'direction == BOTH' handled here? Looks like only PROCESS_OUTPUTS_EDGE edges will be traversed. Please review.
AtlasVertex processVertex = AtlasGraphUtilsV2.findByGuid(guid);
if (direction == INPUT || direction == BOTH) {
Iterable<AtlasEdge> processEdges = processVertex.getEdges(AtlasEdgeDirection.OUT, PROCESS_INPUTS_EDGE);
for (AtlasEdge processEdge : processEdges) {
AtlasVertex datasetVertex = processEdge.getInVertex();
traverseEdges(datasetVertex, INPUT, depth - 1, ret);
}
}
if (direction == OUTPUT || direction == BOTH) {
Iterable<AtlasEdge> processEdges = processVertex.getEdges(AtlasEdgeDirection.OUT, PROCESS_OUTPUTS_EDGE);
for (AtlasEdge processEdge : processEdges) {
AtlasVertex datasetVertex = processEdge.getInVertex();
traverseEdges(datasetVertex, OUTPUT, depth - 1, ret);
}
}
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 303 (patched)
<https://reviews.apache.org/r/71902/#comment307031>
visitedIds => visitedVertices
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 308 (patched)
<https://reviews.apache.org/r/71902/#comment307028>
Given 'BOTH' is not expected here, perhaps the parameter can be:
"LineageDirection direction" => "boolean isInput"
Same in #299 as well.
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 312 (patched)
<https://reviews.apache.org/r/71902/#comment307027>
- can 'incomingEdge' be already present in 'ret'? If yes, rest of this loop body can be skipped with a 'continue' at #312. Please review.
- can 'processVertex' be already visited? If yes, rest of this loop body can be skipped with a 'continue' at #313. For this to work, processVertex should be added to 'visitedIds'. Please review.
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 320 (patched)
<https://reviews.apache.org/r/71902/#comment307033>
Looks like #320 can be moved before entering this 'for' loop, at #315. Please review.
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 340 (patched)
<https://reviews.apache.org/r/71902/#comment307024>
startVertex => vertex
- Madhan Neethiraj
On Dec. 11, 2019, 11:59 p.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71902/
> -----------------------------------------------------------
>
> (Updated Dec. 11, 2019, 11:59 p.m.)
>
>
> Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
>
>
> Bugs: ATLAS-3558
> https://issues.apache.org/jira/browse/ATLAS-3558
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
>
> On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
>
> Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
>
> "atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
>
>
> Diffs
> -----
>
> intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
> repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
>
>
> Diff: https://reviews.apache.org/r/71902/diff/1/
>
>
> Testing
> -------
>
> Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console
>
> Manually validated lineage rendering works fine for simple, complex and circular lineages.
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 71902: ATLAS-3558: Improve lineage performance
using in-memory traversal
Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/#review219035
-----------------------------------------------------------
Ship it!
Ship It!
- Madhan Neethiraj
On Dec. 14, 2019, 3:53 a.m., Sarath Subramanian wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71902/
> -----------------------------------------------------------
>
> (Updated Dec. 14, 2019, 3:53 a.m.)
>
>
> Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
>
>
> Bugs: ATLAS-3558
> https://issues.apache.org/jira/browse/ATLAS-3558
>
>
> Repository: atlas
>
>
> Description
> -------
>
> Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
>
> On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
>
> Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
>
> "atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
>
>
> Diffs
> -----
>
> intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
> repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
>
>
> Diff: https://reviews.apache.org/r/71902/diff/5/
>
>
> Testing
> -------
>
> Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1585/console
>
> Manually validated lineage rendering works fine for simple, complex and circular lineages.
>
>
> Thanks,
>
> Sarath Subramanian
>
>
Re: Review Request 71902: ATLAS-3558: Improve lineage performance
using in-memory traversal
Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------
(Updated Dec. 13, 2019, 7:53 p.m.)
Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
Bugs: ATLAS-3558
https://issues.apache.org/jira/browse/ATLAS-3558
Repository: atlas
Description
-------
Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
Diffs (updated)
-----
intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
Diff: https://reviews.apache.org/r/71902/diff/5/
Changes: https://reviews.apache.org/r/71902/diff/4-5/
Testing
-------
Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1585/console
Manually validated lineage rendering works fine for simple, complex and circular lineages.
Thanks,
Sarath Subramanian
Re: Review Request 71902: ATLAS-3558: Improve lineage performance
using in-memory traversal
Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------
(Updated Dec. 13, 2019, 4:38 p.m.)
Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
Bugs: ATLAS-3558
https://issues.apache.org/jira/browse/ATLAS-3558
Repository: atlas
Description
-------
Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
Diffs (updated)
-----
intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
Diff: https://reviews.apache.org/r/71902/diff/4/
Changes: https://reviews.apache.org/r/71902/diff/3-4/
Testing
-------
Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1585/console
Manually validated lineage rendering works fine for simple, complex and circular lineages.
Thanks,
Sarath Subramanian
Re: Review Request 71902: ATLAS-3558: Improve lineage performance
using in-memory traversal
Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------
(Updated Dec. 13, 2019, 2:54 p.m.)
Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
Changes
-------
addressed review comments.
Bugs: ATLAS-3558
https://issues.apache.org/jira/browse/ATLAS-3558
Repository: atlas
Description
-------
Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
Diffs (updated)
-----
intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
Diff: https://reviews.apache.org/r/71902/diff/3/
Changes: https://reviews.apache.org/r/71902/diff/2-3/
Testing
-------
Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console
Manually validated lineage rendering works fine for simple, complex and circular lineages.
Thanks,
Sarath Subramanian
Re: Review Request 71902: ATLAS-3558: Improve lineage performance
using in-memory traversal
Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------
(Updated Dec. 12, 2019, 3:20 p.m.)
Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
Changes
-------
addressed review comments.
Bugs: ATLAS-3558
https://issues.apache.org/jira/browse/ATLAS-3558
Repository: atlas
Description
-------
Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
Diffs (updated)
-----
intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3
repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d
Diff: https://reviews.apache.org/r/71902/diff/2/
Changes: https://reviews.apache.org/r/71902/diff/1-2/
Testing
-------
Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console
Manually validated lineage rendering works fine for simple, complex and circular lineages.
Thanks,
Sarath Subramanian