You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by Sheetal Shah <sh...@freestoneinfotech.com> on 2022/09/22 17:44:34 UTC

Re: Review Request 74130: ATLAS-4679 : Indexing of deleted relationship edges prolongs entity update time

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/74130/
-----------------------------------------------------------

(Updated Sept. 22, 2022, 11:14 p.m.)


Review request for atlas, Jayendra Parab, Mandar Ambawane, and Pinal Shah.


Repository: atlas


Description
-------

Problem statement : While working with a kafka dump which contained messages from spark streaming applications, 
it was observed that when an application is getting updated, it takes longest time while
re-indexing the edges and that "deleted" relationship edges were also being
re-indexed every-time an application was getting updated for an incoming process message.

Changes made to consider only active edges to process the relationship edges which always ends up
considering only new additional edges for processing/indexing leading to a significant difference in processing time when number of deleted edges are too high for an updating entity


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 68d331dfd 


Diff: https://reviews.apache.org/r/74130/diff/1/


Testing
-------


Thanks,

Sheetal Shah


Re: Review Request 74130: ATLAS-4679 : Indexing of deleted relationship edges prolongs entity update time

Posted by Sidharth Mishra <si...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/74130/#review224700
-----------------------------------------------------------


Ship it!




Ship It!

- Sidharth Mishra


On Sept. 22, 2022, 6:17 p.m., Sheetal Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/74130/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2022, 6:17 p.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Mandar Ambawane, and Pinal Shah.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Problem statement : While working with a kafka dump which contained messages from spark streaming applications, 
> it was observed that when an application is getting updated, it takes longest time while
> re-indexing the edges and that "deleted" relationship edges were also being
> re-indexed every-time an application was getting updated for an incoming process message.
> This takes a few minutes to process for 35k processes, average time was 135 seconds; this time would increase as new processes enter the system.
> 
> Changes have been made to consider only active edges to process the relationship edges which always ends up
> considering only new additional edges for processing/indexing leading to a significant difference in processing time when number of deleted edges are too high for an updating entity
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 68d331dfd 
> 
> 
> Diff: https://reviews.apache.org/r/74130/diff/1/
> 
> 
> Testing
> -------
> 
> We tested the same kafka dump for the changes and the time taken to process messages was significantly less. Running the dump with the fix showed a drastic improvement in that it considered only non-deleted edges for processing/re-indexing leading to a consistent processing time of around 1 to 2 seconds.
> 
> 
> Thanks,
> 
> Sheetal Shah
> 
>


Re: Review Request 74130: ATLAS-4679 : Indexing of deleted relationship edges prolongs entity update time

Posted by Radhika Kundam <ra...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/74130/#review224702
-----------------------------------------------------------


Ship it!




Ship It!

- Radhika Kundam


On Sept. 22, 2022, 11:17 a.m., Sheetal Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/74130/
> -----------------------------------------------------------
> 
> (Updated Sept. 22, 2022, 11:17 a.m.)
> 
> 
> Review request for atlas, Jayendra Parab, Mandar Ambawane, and Pinal Shah.
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> Problem statement : While working with a kafka dump which contained messages from spark streaming applications, 
> it was observed that when an application is getting updated, it takes longest time while
> re-indexing the edges and that "deleted" relationship edges were also being
> re-indexed every-time an application was getting updated for an incoming process message.
> This takes a few minutes to process for 35k processes, average time was 135 seconds; this time would increase as new processes enter the system.
> 
> Changes have been made to consider only active edges to process the relationship edges which always ends up
> considering only new additional edges for processing/indexing leading to a significant difference in processing time when number of deleted edges are too high for an updating entity
> 
> 
> Diffs
> -----
> 
>   repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 68d331dfd 
> 
> 
> Diff: https://reviews.apache.org/r/74130/diff/1/
> 
> 
> Testing
> -------
> 
> We tested the same kafka dump for the changes and the time taken to process messages was significantly less. Running the dump with the fix showed a drastic improvement in that it considered only non-deleted edges for processing/re-indexing leading to a consistent processing time of around 1 to 2 seconds.
> 
> 
> Thanks,
> 
> Sheetal Shah
> 
>


Re: Review Request 74130: ATLAS-4679 : Indexing of deleted relationship edges prolongs entity update time

Posted by Sheetal Shah <sh...@freestoneinfotech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/74130/
-----------------------------------------------------------

(Updated Sept. 22, 2022, 11:47 p.m.)


Review request for atlas, Jayendra Parab, Mandar Ambawane, and Pinal Shah.


Repository: atlas


Description (updated)
-------

Problem statement : While working with a kafka dump which contained messages from spark streaming applications, 
it was observed that when an application is getting updated, it takes longest time while
re-indexing the edges and that "deleted" relationship edges were also being
re-indexed every-time an application was getting updated for an incoming process message.
This takes a few minutes to process for 35k processes, average time was 135 seconds; this time would increase as new processes enter the system.

Changes have been made to consider only active edges to process the relationship edges which always ends up
considering only new additional edges for processing/indexing leading to a significant difference in processing time when number of deleted edges are too high for an updating entity


Diffs
-----

  repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphMapper.java 68d331dfd 


Diff: https://reviews.apache.org/r/74130/diff/1/


Testing (updated)
-------

We tested the same kafka dump for the changes and the time taken to process messages was significantly less. Running the dump with the fix showed a drastic improvement in that it considered only non-deleted edges for processing/re-indexing leading to a consistent processing time of around 1 to 2 seconds.


Thanks,

Sheetal Shah