You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Suma Shivaprasad (JIRA)" <ji...@apache.org> on 2015/08/25 16:42:46 UTC

[jira] [Comment Edited] (ATLAS-122) Support for Deletion of Entities

    [ https://issues.apache.org/jira/browse/ATLAS-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711368#comment-14711368 ] 

Suma Shivaprasad edited comment on ATLAS-122 at 8/25/15 2:42 PM:
-----------------------------------------------------------------

Deletion of entities raises some interesting scenarios like

1. If a hive_database is requested to be deleted, should we support deletion in the case where there are still tables in the model referring to it ? Or should we mandate the user to delete the tables first and then delete the database? So to generalize, if an entity has incoming edges, then we should throw an error saying other entities are dependent on this and hence cannot be deleted. If we dont throw an error , then it leads to challenges like "should we delete the database recursively along with the the tables that refer to it.  To what level/depth of nesting should we go. What if there are other entities like a process referring to the tables, for eg: hive_process, should we delete that process as well? We might lose history/version info if we delete it.

2. If an entity has outgoing edges, for eg: hive_tables has outgoing edges to a  list of columns, can we generalize that these referred entities will also be deleted if they have no other incoming edges other than the current entity being deleted? However this fails when there are outgoing lineage relationship edges that point to other tables. For eg: a hive_process has outgoing edges to input and output tables. So when a delete is requested for a "hive_process/query" , then deleting the tables that it refers to doesnt make much sense even though there are no references to those tables from other processes.


[~svenkat] Thoughts?




was (Author: suma.shivaprasad):
Deletion of entities raises some interesting scenarios like

1. If a hive_database is requested to be deleted, should we support deletion in the case where there are still tables in the model referring to it ? Or should we mandate the user to delete the tables first and then delete the database? So to generalize, if an entity has incoming edges, then we should throw an error saying other entities are dependent on this and hence cannot be deleted. If we dont throw an error , then it leads to challenges like "should we delete the database recursively along with the the tables that refer to it.  To what level/depth of nesting should we go. What if there are other entities like a process referring to the tables, for eg: hive_process, should we delete that process as well? We might lose history/version info if we delete it.

2. If an entity has outgoing edges, for eg: hive_tables has outgoing edges to a  list of columns, can we generalize that these referred entities will also be deleted if they have no other incoming edges other than the current entity being deleted? However this fails when there are outgoing lineage relationship edges point to other tables. For eg: a hive_process has outgoing edges to input and output tables. So when a delete is requested for a "hive_process/query" , then deleting the tables that it refers to doesnt make much sense even though there are no refernces to those tables from other processes.


[~svenkat] Thoughts?



> Support for Deletion of Entities
> --------------------------------
>
>                 Key: ATLAS-122
>                 URL: https://issues.apache.org/jira/browse/ATLAS-122
>             Project: Atlas
>          Issue Type: New Feature
>            Reporter: Suma Shivaprasad
>            Assignee: Suma Shivaprasad
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)