You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Ashutosh Mestry via Review Board <no...@reviews.apache.org> on 2020/03/30 23:19:31 UTC

Review Request 72287: Edge Creation Improvements

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/
-----------------------------------------------------------

Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Bugs: ATLAS-3706
    https://issues.apache.org/jira/browse/ATLAS-3706


Repository: atlas


Description
-------

**Approach**

1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
2. Started importing large number of entities using the _ZipFileMigrationImporter_.
3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.

**Observations**
* Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
* _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
* _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.

**Configuration**
Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
Atlas configuration: 32 GB RAM.


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java 647e3040c 
  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 


Diff: https://reviews.apache.org/r/72287/diff/1/


Testing
-------

**Manual tests**
(See above).
Accuracy verification.

**Unit tests**
Executed existing unit tests.

**Pre-commit build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/


Thanks,

Ashutosh Mestry


Re: Review Request 72287: Edge Creation Improvements

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220193
-----------------------------------------------------------


Ship it!




Ship It!

- Madhan Neethiraj


On April 2, 2020, 3:27 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> -----------------------------------------------------------
> 
> (Updated April 2, 2020, 3:27 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
>     https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/2/
> 
> 
> Testing
> -------
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 72287: Edge Creation Improvements

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/
-----------------------------------------------------------

(Updated April 2, 2020, 6:14 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include: New PC build details.


Bugs: ATLAS-3706
    https://issues.apache.org/jira/browse/ATLAS-3706


Repository: atlas


Description
-------

**Approach**

1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
2. Started importing large number of entities using the _ZipFileMigrationImporter_.
3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.

**Observations**
* Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
* _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
* _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.

**Configuration**
Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
Atlas configuration: 32 GB RAM.


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 


Diff: https://reviews.apache.org/r/72287/diff/2/


Testing (updated)
-------

**Manual tests**
(See above).
Accuracy verification.

**Unit tests**
Executed existing unit tests.

**Pre-commit build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1782/


Thanks,

Ashutosh Mestry


Re: Review Request 72287: Edge Creation Improvements

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/
-----------------------------------------------------------

(Updated April 2, 2020, 3:27 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.


Changes
-------

Updates include: Addressed review comments.


Bugs: ATLAS-3706
    https://issues.apache.org/jira/browse/ATLAS-3706


Repository: atlas


Description
-------

**Approach**

1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
2. Started importing large number of entities using the _ZipFileMigrationImporter_.
3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.

**Observations**
* Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
* _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
* _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.

**Configuration**
Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
Atlas configuration: 32 GB RAM.


Diffs (updated)
-----

  repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 


Diff: https://reviews.apache.org/r/72287/diff/2/

Changes: https://reviews.apache.org/r/72287/diff/1-2/


Testing
-------

**Manual tests**
(See above).
Accuracy verification.

**Unit tests**
Executed existing unit tests.

**Pre-commit build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/


Thanks,

Ashutosh Mestry


Re: Review Request 72287: Edge Creation Improvements

Posted by Ashutosh Mestry via Review Board <no...@reviews.apache.org>.

> On April 2, 2020, 5:39 a.m., Madhan Neethiraj wrote:
> > repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
> > Lines 344 (patched)
> > <https://reviews.apache.org/r/72287/diff/1/?file=2216514#file2216514line344>
> >
> >     edgeLabel is typicallu used to find subset of edges from a given vertex. Having an edge-index on the label probably won't help improve the performance; however, need to understand the impact of creating this index in an existing Atlas instance having large number of edges. 1) Would index be populated with existing edge labels? 2) If yes, how long would the index creation take - say for 1m edges? 3) If no, would search ignore edges that were not indexd?
> >     
> >     I suggest to find the performace impact of not having this index.

I did a run last night without the index and it did not have impact on the performance. I have removed this change.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220181
-----------------------------------------------------------


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> -----------------------------------------------------------
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
>     https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java 647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> -------
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 72287: Edge Creation Improvements

Posted by Madhan Neethiraj <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220181
-----------------------------------------------------------


Fix it, then Ship it!





repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
Lines 344 (patched)
<https://reviews.apache.org/r/72287/#comment308499>

    edgeLabel is typicallu used to find subset of edges from a given vertex. Having an edge-index on the label probably won't help improve the performance; however, need to understand the impact of creating this index in an existing Atlas instance having large number of edges. 1) Would index be populated with existing edge labels? 2) If yes, how long would the index creation take - say for 1m edges? 3) If no, would search ignore edges that were not indexd?
    
    I suggest to find the performace impact of not having this index.


- Madhan Neethiraj


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> -----------------------------------------------------------
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
>     https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java 647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> -------
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>


Re: Review Request 72287: Edge Creation Improvements

Posted by Nikhil Bonte <ni...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/72287/#review220180
-----------------------------------------------------------


Ship it!




Ship It!

- Nikhil Bonte


On March 30, 2020, 11:19 p.m., Ashutosh Mestry wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/72287/
> -----------------------------------------------------------
> 
> (Updated March 30, 2020, 11:19 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3706
>     https://issues.apache.org/jira/browse/ATLAS-3706
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> **Approach**
> 
> 1. Added Metrics to most of the methods in entity creation. (The patch does not include the additional metrics added to additional places.)
> 2. Started importing large number of entities using the _ZipFileMigrationImporter_.
> 3. Observed behavior of import over 24 hours. Observations included CPU usage, memory usage and the import throughput using the _metric.log_.
> 4. Changes were added to the one at a time. Impact of the change was observed for performance (via metric.log) and accuracy before next change was added.
> 
> **Observations**
> * Relationship creation took inordinately large amount of time under load. The time was spent in _GraphHelper.getAdjacentEdgesByLabel_. This implementation also caused memory build up of _AtlasEdge_ objects which stayed in memory for long time. This had the secondary effect of slowing down entity creation operations after about 6 hours (this duration differed with node configuration).
> * _GraphHelper.getOrCreateEdge_ did a vertex to vertex comparison which is time consuming.
> * _GraphBackedSearchIndexer_ edge label index. Majority of edge creation operation included lookup by edge label.
> 
> **Configuration**
> Cluster: 3 node: 40 cores, 128 GB RAM, 1.5 TB of disk space.
> Atlas configuration: 32 GB RAM.
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java 647e3040c 
>   repository/src/main/java/org/apache/atlas/repository/graph/GraphHelper.java 5ab9f4d13 
> 
> 
> Diff: https://reviews.apache.org/r/72287/diff/1/
> 
> 
> Testing
> -------
> 
> **Manual tests**
> (See above).
> Accuracy verification.
> 
> **Unit tests**
> Executed existing unit tests.
> 
> **Pre-commit build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1776/
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>