You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by "Mandy Chessell (JIRA)" <ji...@apache.org> on 2017/02/19 11:36:45 UTC

[jira] [Commented] (ATLAS-1410) V2 Glossary API

    [ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873650#comment-15873650 ] 

Mandy Chessell commented on ATLAS-1410:
---------------------------------------

Comments on V1.0

- Page numbers would help to tie these comments to the document.
- Page 2 - Asset type - defined in terms of itself.  How are they used? or is this not relevant to this paper?
- Page 2 - Why do we need to know about V1 and V2?  I think it is because the current interfaces works with V1 and the new one will work with V2 - it would be helpful to state this explicitly.
- Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships.
- Page 4 - missing from list - ability to associate a semantic meaning to a classification (v2), trait (v1)?
- Page 4 - Missing from the list - "typed-by" relationship to associate terms that include meaning in context with terms that describe more pure objects.  For example Home Address is typed by Address. 
- Page 5 - Figure 1 - I am not comfortable with terms being owned by categories.  I think each terms should be owned by a glossary and linked into 0, 1 or more categories as appropriate.  This creates a much simpler deletion rule for the API/End user - particularly when you look at Figure 2 where terms are owned by multiple categories. IE, delete term from its glossary and it is deleted.  In the proposed design, it raises such questions as "Is the term deleted when unlinked from all categories - or the first category it is linked to?"
- Page 6 - Figure 3 - I need more detail to understand the "classifies" relationship and how it relates to a classification.  It seems redundant.  Would you not relate a term to a classification which is in itself semantically classified by its definition term?
- Page 6 - Bullet 6) - What is the alternative to using Gremlin queries?
- Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph that follows supposed to be a nested bullet list?  Assuming it is a follow on point.  My confusion is that I do not understand why the term/category hierarchy is relevant to the enhancement of classifications?  The Classification object is defining the type of classification and its meaning is coming from the term?  Is this suggesting that the relationships between classifications is coming from the term relationships in the same way we do thin in IGC today?  If so it may help to show an example?
- Page 7 - Figure 4 and 5 - what is the difference between "Classification" and "Classification Relationship"?
- Page 7 - Maybe strange examples - the Glossaries would be for different subject areas - for example, there may be a marketing glossary, a customer care glossary, a banking glossary.  These may be used for associating meaning to data assets (ie data assets).  there may also be glossaries for different regulations, or standard governance approaches, and these may include terms that can be used to describe classification for data that drive operational governance?
- Page 8 - I am not sure what the proposed enhancements are - it just seems to list the problems with the current model.  All relationships in metadata are bi-directional.  It should be the default.  This mechanism seems complicated.  Really need to define relationships independent of entities so we can define attributes on these relationships.  The Classification is actually an example of an independently defined relationship that includes the GUID of the 2 entities it connects.   This should be the common style of relationship.  
- Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the terms are placed in - I thought this was included in the proposal and we do need this for organising terms so that people can find them - and the category hierarchies (taxonomies) help to provide context to terms too.  Also, the semantic relationships discussed would mean we could support a simple ontology.
- Page 9 - Fully-qualified name - What a grandparent or parent term?  What does a fully qualified name mean and when is it used?  The unique name is its GUID.  Its path name (there may be many) is the navigation to the term through the category hierarchies.
- Page 9 - why do Atlas terms need to follow the schema in defined at this link - https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?   it seem to imply a lifecycle which is not included in this proposal and a very specific modelling of the IBM industry models that have mandatory fields that are not always applicable to all glossaries.  I think this doc should describe the schema of the glossary term explicitly and explain the fields.
- page 10 - Figure 7 shows the navigation relationships and 1 way.  We need to be able to navigate from the hive table to its classification to support the GAF.
- Page 11 - Figure 8 - Atlas entities box is hard to see which are terms and which are assets (ie hive columns)
- Page 12 - Fully qualified name - Does this require all categories to be in a 3 level hierarchy - or is this an example of a path name that happens to be 3 levels deep?
- Page 12 - What does the Taxonomy refer to in this table?
- Page 13 - The Glossary API is a OMAS API.



> V2 Glossary API
> ---------------
>
>                 Key: ATLAS-1410
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1410
>             Project: Atlas
>          Issue Type: Improvement
>            Reporter: David Radley
>            Assignee: David Radley
>         Attachments: Atlas Glossary V2 proposal v1.0.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. There are newer more funcitonal versions of this capability in the atlas-intg project. This Jira is changing over the glossary implementation to the newer entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
>                 AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, "Department"+_description, ImmutableSet.<String>of(),
>                         AtlasTypeUtil.createRequiredAttrDef("name", "string"),
>                         new AtlasAttributeDef("employees", String.format("array<%s>", "Person"), true,
>                                 AtlasAttributeDef.Cardinality.SINGLE, 0, 1, false, false,
>                                 Collections.<AtlasStructDef.AtlasConstraintDef>emptyList()));
>         AtlasEntityDef personTypeDef = AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, ImmutableSet.<String>of(),
>                 AtlasTypeUtil.createRequiredAttrDef("name", "string"),
>                 AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
>                 AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
>                 AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
>                 AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
>                 AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
>                 AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
>                 AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we should be able to have the type system manage edge deletion. As part of this, we will need to investigate whether we could get rid of the disconnect and connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [jira] [Commented] (ATLAS-1410) V2 Glossary API

Posted by Russell Anderson <rg...@us.ibm.com>.

These points that Mandy raises needs to be addressed.

Russ

Sent from my iPhone

> On Feb 19, 2017, at 6:37 AM, Mandy Chessell (JIRA) <ji...@apache.org>
wrote:
>
>
>
[ https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873650#comment-15873650 ]

>
> Mandy Chessell commented on ATLAS-1410:
> ---------------------------------------
>
> Comments on V1.0
>
> - Page numbers would help to tie these comments to the document.
> - Page 2 - Asset type - defined in terms of itself.  How are they used?
or is this not relevant to this paper?
> - Page 2 - Why do we need to know about V1 and V2?  I think it is because
the current interfaces works with V1 and the new one will work with V2 - it
would be helpful to state this explicitly.
> - Page 4 - bullets 4-5 - has-a and is-a relationships are semantic
relationships.
> - Page 4 - missing from list - ability to associate a semantic meaning to
a classification (v2), trait (v1)?
> - Page 4 - Missing from the list - "typed-by" relationship to associate
terms that include meaning in context with terms that describe more pure
objects.  For example Home Address is typed by Address.
> - Page 5 - Figure 1 - I am not comfortable with terms being owned by
categories.  I think each terms should be owned by a glossary and linked
into 0, 1 or more categories as appropriate.  This creates a much simpler
deletion rule for the API/End user - particularly when you look at Figure 2
where terms are owned by multiple categories. IE, delete term from its
glossary and it is deleted.  In the proposed design, it raises such
questions as "Is the term deleted when unlinked from all categories - or
the first category it is linked to?"
> - Page 6 - Figure 3 - I need more detail to understand the "classifies"
relationship and how it relates to a classification.  It seems redundant.
Would you not relate a term to a classification which is in itself
semantically classified by its definition term?
> - Page 6 - Bullet 6) - What is the alternative to using Gremlin queries?
> - Page 6 - Bullet 7) - is this an incomplete sentence - or does the
paragraph that follows supposed to be a nested bullet list?  Assuming it is
a follow on point.  My confusion is that I do not understand why the
term/category hierarchy is relevant to the enhancement of classifications?
The Classification object is defining the type of classification and its
meaning is coming from the term?  Is this suggesting that the relationships
between classifications is coming from the term relationships in the same
way we do thin in IGC today?  If so it may help to show an example?
> - Page 7 - Figure 4 and 5 - what is the difference between
"Classification" and "Classification Relationship"?
> - Page 7 - Maybe strange examples - the Glossaries would be for different
subject areas - for example, there may be a marketing glossary, a customer
care glossary, a banking glossary.  These may be used for associating
meaning to data assets (ie data assets).  there may also be glossaries for
different regulations, or standard governance approaches, and these may
include terms that can be used to describe classification for data that
drive operational governance?
> - Page 8 - I am not sure what the proposed enhancements are - it just
seems to list the problems with the current model.  All relationships in
metadata are bi-directional.  It should be the default.  This mechanism
seems complicated.  Really need to define relationships independent of
entities so we can define attributes on these relationships.  The
Classification is actually an example of an independently defined
relationship that includes the GUID of the 2 entities it connects.   This
should be the common style of relationship.
> - Page 9 - on discussion point - a Taxonomy is a hierarchy of categories
that the terms are placed in - I thought this was included in the proposal
and we do need this for organising terms so that people can find them - and
the category hierarchies (taxonomies) help to provide context to terms too.
Also, the semantic relationships discussed would mean we could support a
simple ontology.
> - Page 9 - Fully-qualified name - What a grandparent or parent term?
What does a fully qualified name mean and when is it used?  The unique name
is its GUID.  Its path name (there may be many) is the navigation to the
term through the category hierarchies.
> - Page 9 - why do Atlas terms need to follow the schema in defined at
this link -
https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?
   it seem to imply a lifecycle which is not included in this proposal and
a very specific modelling of the IBM industry models that have mandatory
fields that are not always applicable to all glossaries.  I think this doc
should describe the schema of the glossary term explicitly and explain the
fields.
> - page 10 - Figure 7 shows the navigation relationships and 1 way.  We
need to be able to navigate from the hive table to its classification to
support the GAF.
> - Page 11 - Figure 8 - Atlas entities box is hard to see which are terms
and which are assets (ie hive columns)
> - Page 12 - Fully qualified name - Does this require all categories to be
in a 3 level hierarchy - or is this an example of a path name that happens
to be 3 levels deep?
> - Page 12 - What does the Taxonomy refer to in this table?
> - Page 13 - The Glossary API is a OMAS API.
>
>
>
>> V2 Glossary API
>> ---------------
>>
>>                Key: ATLAS-1410
>>                URL: https://issues.apache.org/jira/browse/ATLAS-1410
>>            Project: Atlas
>>         Issue Type: Improvement
>>           Reporter: David Radley
>>           Assignee: David Radley
>>        Attachments: Atlas Glossary V2 proposal v1.0.pdf
>>
>>
>> The BaseResourceDefinition uses the AttributeDefintion class from
typesystem. There are newer more funcitonal versions of this capability in
the atlas-intg project. This Jira is changing over the glossary
implementation to the newer entity / type classes.
>> Instread of the instanceProperties and collectionProperties in the
BaseResourceDefintions we should use something in this sort of style :
>> "
>> AtlasEntityDef deptTypeDef =
>>                AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE,
"Department"+_description, ImmutableSet.<String>of(),
>>                        AtlasTypeUtil.createRequiredAttrDef("name",
"string"),
>>                        new AtlasAttributeDef("employees", String.format
("array<%s>", "Person"), true,
>>                                AtlasAttributeDef.Cardinality.SINGLE, 0,
1, false, false,
>>
Collections.<AtlasStructDef.AtlasConstraintDef>emptyList()));
>>        AtlasEntityDef personTypeDef = AtlasTypeUtil.createClassTypeDef
("Person", "Person"+_description, ImmutableSet.<String>of(),
>>                AtlasTypeUtil.createRequiredAttrDef("name", "string"),
>>                AtlasTypeUtil.createOptionalAttrDef("address",
"Address"),
>>                AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
>>                AtlasTypeUtil.createOptionalAttrDef("hasPets",
"boolean"),
>>                AtlasTypeUtil.createOptionalAttrDef("numberOfCars",
"byte"),
>>                AtlasTypeUtil.createOptionalAttrDef("houseNumber",
"short"),
>>                AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
>>                AtlasTypeUtil.createOptionalAttrDef("age", "float"),
>> "
>> For the parent child relationships with glossary categories and terms we
should be able to have the type system manage edge deletion. As part of
this, we will need to investigate whether we could get rid of the
disconnect and connect methods added in ATLAS-1186
>>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.15#6346)
>