You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Suma Shivaprasad (JIRA)" <ji...@apache.org> on 2016/02/26 01:04:18 UTC
[jira] [Commented] (ATLAS-535) Support delete cascade efficently

    [ https://issues.apache.org/jira/browse/ATLAS-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168154#comment-15168154 ] 

Suma Shivaprasad commented on ATLAS-535:
----------------------------------------

{noformat}

Modelling  DELETE cascades across entities


Background

Currently, the Typesystem allows modelling relationship behaviour between types as part of its attribute flags. The isComposite flag on an attribute defines that the relation between the current type and the attribute Type (which is annotated with the isComposite flag) have a “composition” relationship indicating that the referred instance needs to be loaded, deleted whenever the current instance is loaded/deleted. For eg: hive_table.columns has an isComposite relationship and whenever a table is loade/deletd , the columns are also loaded/deleted.

API changes

deleteEntity API should have another flag to indicate cascading deletes


Modelling/Repository changes

Option 1: 

Add an attribute array<hive_table> in hive_db 

Pros:

Works OOB and does not need any code changes
Since the entity being deleted is also the source from which the delete cascade begins i.e it is the parent entity, we know exactly which edges i.e the ones with label __typeName.attributeName and vertices are to be deleted. 

Cons:

The current support for adding such an attribute flag is limited in its application in some cases. For eg: Database->Table , Table -> Partitions could have issues since any add of a partition will require updating the Table entity and add it to array<partition>s which could possibly have issues with scale . If we take hourly, daily partitions as worst case over five years, it could have around ~50000 - partition entries for a table. Not sure what can be an average number of tables that we should support for a Database  ?
Will have to implement another flag isVisible/lazyFetch on an attribute to not load/display the tables or do a lazy fetch when a database is loaded  since this is more of an atribute added for internal reasons and should not be displayed when a database is viewed. If we add a lazyFetch, should we load all the entries in the array ?

Option 2:

Add an attribute flag called isInverseComposite on hive_table.db.


In this case, 

whenever an instance of hive_db needs to be deleted, it needs to look at all the incoming vertices with edge label starting with __hive_db, look at their type definition and check if isInverseComposite flag is set on them for the current type attribute. If set, then remove the corresponding vertices and edges 


Get or update behaviour does not change/affected based on this flag. 


Pros:

Simple approach and doesnt need intrusive code changes

Cons:

An additional flag that users need to define in the type definition. 
Need to iterate over all the edges( which could be potentially large and check which ones have the labels starting with that typeName prefix).  However, on an average there could be mostly one or maximum two such attributes which have a potentially large number of edges and hence the scan would anyways mostly go through all the vertices that need to be deleted.


Option 3:

There is no way currently to model associations between any two types/classes. The proposal is to model this in a generic way as to be able to represent various association rules between types which are not attribute specific . For eg: Database to Table is a composition relationship.

Define a generic new internal type 


AssociationRule  

attributes:
 String targetType   // the type which which the association rule is being defined
 String name   // the name of this Rule

Note: Typesystem will enforce a typecheck on the targetType using existing types. 


A type definition will have a Collection<AssociationRule> along with the existing attribute definitions, traits etc


CascadeRule extends AssociationRule


DeleteCascadeRule extends CascadeRule

Currently the only Cascade type supported is DELETE

However going forward it could be extended later to varous other types like  the JPA cascade types - for updates, gets etc -  https://docs.oracle.com/javaee/6/api/javax/persistence/CascadeType.html

Also going forward AssociationRule(s) could be attached at an attribute level i.e isComposite on an attribute can be changed to be a DeleteCascade rule instead. So the same set of association rules can apply at both the type , attribute levels.

When a delete with cascade is issued on an entity, if its corresponding type contains a DeleteCascadeRule, delete any references from this entity which are of the targetType for eg: when an entity of hive_db is deleted, it will delete all the hive_table  entities associated with it. In order to find the vertices to delete, it will follow all edges starting with the typeName __hive_table(targetType) and delete the referred vertices. This should work for all the complex and collection types -  array, map, struct and class references. 


Pros:

Generic and can be used to define any associations between two types and use them in any aspect of ATLAS eg: during entity mutation - updates, gets, delete behaviour etc.
the current hive model of Table-> Database reference will not need a change which means that there are no extra updates whenever a table is added which was the case in Option 1.

Cons:

Is more intrusive and will need changes in type system apart from entity mutation. 
Need to iterate over all the edges( which could be potentially large and check which ones have the labels starting with that typeName prefix).  However, on an average there could be mostly one or maximum two such attributes which have a potentially large number of edges and hence the scan would anyways mostly go through all the vertices that need to be deleted. Also deletes in general could be a less used operation than creates/updates.


Due to its simplicity and non-intrusive code changes, leaning towards Option 2. Thoughts?

{noformat}

> Support delete cascade efficently
> ---------------------------------
>
>                 Key: ATLAS-535
>                 URL: https://issues.apache.org/jira/browse/ATLAS-535
>             Project: Atlas
>          Issue Type: Sub-task
>            Reporter: Suma Shivaprasad
>             Fix For: 0.7-incubating
>
>
> Currently there are some limitation in the typesystem and modelling to support delete cascades at scale through the isComposite flag



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)