You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Sarath Subramanian <sa...@apache.org> on 2019/12/11 23:59:34 UTC

Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------

Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.


Bugs: ATLAS-3558
    https://issues.apache.org/jira/browse/ATLAS-3558


Repository: atlas


Description
-------

Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0

On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.

Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.

"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.


Diffs
-----

  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
  repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 


Diff: https://reviews.apache.org/r/71902/diff/1/


Testing
-------

Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console


Thanks,

Sarath Subramanian


Re: Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/#review219011
-----------------------------------------------------------




intg/src/main/java/org/apache/atlas/AtlasConfiguration.java
Lines 66 (patched)
<https://reviews.apache.org/r/71902/#comment307029>

    LINEAGE_GRAPH_QUERY("atlas.use.graph.query.for.lineage", false) => LINEAGE_USING_GREMLIN("atlas.lineage.query.use.gremlin", false)



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 84 (patched)
<https://reviews.apache.org/r/71902/#comment307030>

    useGraphQueryForLineage => LINEAGE_USING_GREMLIN



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 205 (patched)
<https://reviews.apache.org/r/71902/#comment307023>

    To be consistent, consider replacing "UsingGraphQuery" with "V1" as suffix.



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 264 (patched)
<https://reviews.apache.org/r/71902/#comment307034>

    Consider returning "Collection<AtlasEdge>" from this method, as the caller uses only the value of the returned map.



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 280 (patched)
<https://reviews.apache.org/r/71902/#comment307025>

    How is 'direction == BOTH' handled here? Looks like only PROCESS_OUTPUTS_EDGE edges will be traversed. Please review.
    
      AtlasVertex processVertex = AtlasGraphUtilsV2.findByGuid(guid);
    
      if (direction == INPUT || direction == BOTH) {
        Iterable<AtlasEdge> processEdges = processVertex.getEdges(AtlasEdgeDirection.OUT, PROCESS_INPUTS_EDGE);
    
        for (AtlasEdge processEdge : processEdges) {
          AtlasVertex datasetVertex = processEdge.getInVertex();
    
          traverseEdges(datasetVertex, INPUT, depth - 1, ret);
        }
      }
    
      if (direction == OUTPUT || direction == BOTH) {
        Iterable<AtlasEdge> processEdges = processVertex.getEdges(AtlasEdgeDirection.OUT, PROCESS_OUTPUTS_EDGE);
    
        for (AtlasEdge processEdge : processEdges) {
          AtlasVertex datasetVertex = processEdge.getInVertex();
    
          traverseEdges(datasetVertex, OUTPUT, depth - 1, ret);
        }
      }



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 303 (patched)
<https://reviews.apache.org/r/71902/#comment307031>

    visitedIds => visitedVertices



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 308 (patched)
<https://reviews.apache.org/r/71902/#comment307028>

    Given 'BOTH' is not expected here, perhaps the parameter can be:
      "LineageDirection direction" => "boolean isInput"
    
    Same in #299 as well.



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 312 (patched)
<https://reviews.apache.org/r/71902/#comment307027>

    - can 'incomingEdge' be already present in 'ret'? If yes, rest of this loop body can be skipped with a 'continue' at #312. Please review.
    - can 'processVertex' be already visited? If yes, rest of this loop body can be skipped with a 'continue' at #313. For this to work, processVertex should be added to 'visitedIds'. Please review.



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 320 (patched)
<https://reviews.apache.org/r/71902/#comment307033>

    Looks like #320 can be moved before entering this 'for' loop, at #315. Please review.



repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java
Lines 340 (patched)
<https://reviews.apache.org/r/71902/#comment307024>

    startVertex => vertex


- Madhan Neethiraj


On Dec. 11, 2019, 11:59 p.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71902/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2019, 11:59 p.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3558
>     https://issues.apache.org/jira/browse/ATLAS-3558
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
> 
> On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
> 
> Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
> 
> "atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
> 
> 
> Diffs
> -----
> 
>   intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
>   repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 
> 
> 
> Diff: https://reviews.apache.org/r/71902/diff/1/
> 
> 
> Testing
> -------
> 
> Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console
> 
> Manually validated lineage rendering works fine for simple, complex and circular lineages.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/#review219035
-----------------------------------------------------------


Ship it!




Ship It!

- Madhan Neethiraj


On Dec. 14, 2019, 3:53 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71902/
> -----------------------------------------------------------
> 
> (Updated Dec. 14, 2019, 3:53 a.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3558
>     https://issues.apache.org/jira/browse/ATLAS-3558
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0
> 
> On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.
> 
> Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.
> 
> "atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.
> 
> 
> Diffs
> -----
> 
>   intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
>   repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 
> 
> 
> Diff: https://reviews.apache.org/r/71902/diff/5/
> 
> 
> Testing
> -------
> 
> Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1585/console
> 
> Manually validated lineage rendering works fine for simple, complex and circular lineages.
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>


Re: Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------

(Updated Dec. 13, 2019, 7:53 p.m.)


Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.


Bugs: ATLAS-3558
    https://issues.apache.org/jira/browse/ATLAS-3558


Repository: atlas


Description
-------

Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0

On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.

Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.

"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.


Diffs (updated)
-----

  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
  repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 


Diff: https://reviews.apache.org/r/71902/diff/5/

Changes: https://reviews.apache.org/r/71902/diff/4-5/


Testing
-------

Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1585/console

Manually validated lineage rendering works fine for simple, complex and circular lineages.


Thanks,

Sarath Subramanian


Re: Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------

(Updated Dec. 13, 2019, 4:38 p.m.)


Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.


Bugs: ATLAS-3558
    https://issues.apache.org/jira/browse/ATLAS-3558


Repository: atlas


Description
-------

Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0

On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.

Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.

"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.


Diffs (updated)
-----

  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
  repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 


Diff: https://reviews.apache.org/r/71902/diff/4/

Changes: https://reviews.apache.org/r/71902/diff/3-4/


Testing
-------

Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1585/console

Manually validated lineage rendering works fine for simple, complex and circular lineages.


Thanks,

Sarath Subramanian


Re: Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------

(Updated Dec. 13, 2019, 2:54 p.m.)


Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.


Changes
-------

addressed review comments.


Bugs: ATLAS-3558
    https://issues.apache.org/jira/browse/ATLAS-3558


Repository: atlas


Description
-------

Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0

On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.

Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.

"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.


Diffs (updated)
-----

  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
  repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 


Diff: https://reviews.apache.org/r/71902/diff/3/

Changes: https://reviews.apache.org/r/71902/diff/2-3/


Testing
-------

Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console

Manually validated lineage rendering works fine for simple, complex and circular lineages.


Thanks,

Sarath Subramanian


Re: Review Request 71902: ATLAS-3558: Improve lineage performance using in-memory traversal

Posted by Sarath Subramanian <sa...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71902/
-----------------------------------------------------------

(Updated Dec. 12, 2019, 3:20 p.m.)


Review request for atlas, Ashutosh Mestry, Aadarsh Jajodia, keval bhatt, Sridhar K, Le Ma, Mandar Ambawane, mayank jain, Nixon Rodrigues, Sameer Shaikh, and Sarath Subramanian.


Changes
-------

addressed review comments.


Bugs: ATLAS-3558
    https://issues.apache.org/jira/browse/ATLAS-3558


Repository: atlas


Description
-------

Lineage in atlas uses graph query to compute lineage across entities (inputs, outputs or both). Lineage rendering performance have degraded after using janusgraph version 0.4.0

On investigation, lineage graph query initialization and execution using gremlin script engine has been found to be the bottleneck.

Alternate in-memory computation of lineage has improved performance by many folds (~90% improvement). This Jira is about adding alternate in-memory computation of lineage.

"atlas.use.graph.query.for.lineage" property can be used to toggle between graph query and in-memory computation of lineage. The default option will be in-memory.


Diffs (updated)
-----

  intg/src/main/java/org/apache/atlas/AtlasConfiguration.java 979bd0ae3 
  repository/src/main/java/org/apache/atlas/discovery/EntityLineageService.java 9a020468d 


Diff: https://reviews.apache.org/r/71902/diff/2/

Changes: https://reviews.apache.org/r/71902/diff/1-2/


Testing
-------

Precommit: https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1581/console

Manually validated lineage rendering works fine for simple, complex and circular lineages.


Thanks,

Sarath Subramanian