You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "David Radley (JIRA)" <ji...@apache.org> on 2017/05/02 15:01:04 UTC

[jira] [Commented] (ATLAS-1690) Introduce top level relationships

    [ https://issues.apache.org/jira/browse/ATLAS-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993032#comment-15993032 ] 

David Radley commented on ATLAS-1690:
-------------------------------------

Thank you very much for your insightful reviews [~mandy_chessell] and [~suma.shivaprasad].

Response to Mandy's review

Pg2 "Metadata repositories store metadata. The context of a metadata object is dictated by its relationships."  I am not sure these sentences tell the complete story.  Maybe something like "The Apache Atlas metadata repository stores metadata objects and their relationships.  The relationships between the metadata objects are as important as the metadata objects themselves.  They explain how the data landscape is structured and how the components within it relate to the business and the governance requirements, ownership and other interested parties.   The relationships in Apache Atlas today provide support for containment (or part-of) relationships.  This is necessary to describe sub-components of a component - for example, a Hive Column is a sub-component of a Hive Table.  With these types of relationships, the lifetime of the sub-components is tied to their parent component.  So for example, if a hive table is deleted, then all of its columns should also be deleted.  This design is looking to add support for a new type of relationship between metadata objects that have independent lifetimes.  In fact the creation of these relationships are actually an auditable action that can impact how data is discovered, understood, secured, managed and removed.  Such relationships include when Glossary ..." <<David agreed - your words are much better :-) >>

Pg2  "If these links are made incorrectly (purposely or otherwise) data can be inappropriately exposed." This comment is out of place - it is only true if the relationship is involved in access control.  A more general comment could be "If these links are made incorrectly (purposely or otherwise) data may be inappropriately used or governed." <<David agreed>>

Font of JSON example on page 4 is inconsistent - harder to read than necessary. <<David agreed - I have made all the json fonts consistent>>

pg5 - "Relationship constraints" - first time mentioned this term - should be introduced in examples above.<<David I have removed this phrase - I meant the constraints, that are in the examples>> 

pg5 - "This name will help us name an association and its associated
classification."  Not sure what classification means in this sentence.  Also need a description of why an association needs a name (I am thinking of this as a Type name - is that right?   The name is important because the creation of these types of relationships are a deliberate act of governance and we need to be able to describe their use - and govern their lifecycle. <<David removed the relationship name paragraph>> 

pg 5 "“Address” and “Person”; a person has addresses, and addresses have people living in them. In this case, there is no obvious direction, so a bidirectional relationship is natural way of associating these concepts; the alternative would be 2 directional relationships that would not be kept in sync."  Please use a metadata description - this is confusing to talk about data relationships. <<David agreed>>

pg 5 "There are 2 main styles of relationships, tight and loose relationships."  Why have new names been for these when at the top the doc states it is using UML names?  Also the names are misleading.  There is nothing loose about the association between a glossary term and a database column.  <<David agreed >>

pg 6 "In the case of tight relationships, the top entity and its children are governed as one, as the lifecycles of the children are tied to the parent. "  It is true that the lifecycles are linked but it does not mean the governance is tied - for example, the confidentiality classification of a table may be different from the different columns it is made up of.  Governance rules may be defined to act on specific columns and not on a table as a whole.
<<David changed >>
pg 6/7 - RelationshipDef example - please use metadata examples not data examples - it is confusing because you would never define types for customer and account in Atlas.<<David agreed >>

pg7 "The entity instances use Atlas object ids pointing to the relationship instance (which has a guid)."  This needs further explanation and an example.

pg 8 "Read" - what are the parameters on read - is this a single relationship operation?<<David updated >>

pg 8 "Aggregation implies that here is containment "  I know what you mean but aggregation and containment are different things in UML and so this statement is not logically correct.<<David updated >>

p8 "A natural way to specify aggregation would be to have an isContainer Boolean flag, defaulting to false and specified on one of the endpoints in the relationship."  Should say this flag can only be set on one end.  <<David updated >>

pg8 - aggregations example - please use metadata example such as category to term <<David updated >>

pg8 - observations - a relationship described by a relationshipDef can not be mandatory.  The isOptional flag is obsolete.  Can we remove it?<<David not easily as we are using the standard attribute definitions in entity definitions >>

Response to Suma's review. 

1. What does relationship "structure" indicate in the examples? Didnt see any reason why its needed? Can you pls illustrate why its needed ? <<David I have updated the docs. An example would be expires on, or retention period. >>
2. Also I saw a note saying we dont need a relationship "name" . However saw that in the examples and it is needed to define the relation type in the typesystem, correct ? <<David I have removed name in the composition case, I have left a discussion point on what this might mean for the top level relationship>>
3. Also if we can illustrate what it means in terms of instances, how would the edges translate in the graph in the design, that would be good. <<David I have added some pictures>>
4. Also adding a relationship type/category would help in identifying different kinds of relationship? For eg: composition/aggregation/inheritance/association etc instead of indirectly deriving it from the flags like isContainer etc? There might be other categories which map not map one on one to UML like is_a_type_of which you had mentioned in the Glossary Design doc. this would also help in easily discovering relationships among model types which is not possible currently. <<David Yes this could be a good way to implement this. I know V1 started with isComposite - as the relationship top level object is only used for bidirectional relationship - there is not one object to put the categories of relationship types in. Let me know what you think.    >>




> Introduce top level relationships
> ---------------------------------
>
>                 Key: ATLAS-1690
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1690
>             Project: Atlas
>          Issue Type: Improvement
>            Reporter: David Radley
>            Assignee: David Radley
>              Labels: VirtualDataConnector
>         Attachments: Atlas Relationships proposal v1.0.pdf, Atlas Relationships proposal v1.1.pdf, Atlas Relationships proposal v1.2.pdf, Atlas Relationships proposal v1.3.pdf, Atlas Relationships proposal v1.4.pdf
>
>
> Introduce top level relationships including support for 
> -many to many relationships
> - relationship names including the name for both ends and the relationship.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)