You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@atlas.apache.org by David Radley <da...@uk.ibm.com> on 2018/01/11 16:52:19 UTC

Tag propagation

Hi Madhan, 
I have a look in the code - I was surprised that the tag propagation was 
not in. Is this something you are looking at in the near future? If not I 
may need to look into it. I suggest the tag propagation implementation 
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this.
- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea.

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm".   

What do you think?   all the best, David. 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by Mandy Chessell <ma...@uk.ibm.com>.

Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both 
directions.  Most metadata relationships are not hierarchical.  They are 
two-way and different situations will cause for different classifications 
to flow in each direction.  I do not remember the discussion on removing 
the BOTH open - but if I missed it I apologise.  What is the 
justification?

The enforcement of the classification's entity types should not prevent 
the propagation of the tag through an entity because it does not support a 
tag.  Down stream entities may support the tag and need it to be 
propagated to them.  We need to work through more scenarios because we 
also need a way to bound tag propagation :)

As an FYI, the OMRS API for classifications includes an origin attribute 
that lets us return classifications with an entity that are explicitly 
assigned or propagated to the entity.  Most callers will not care but some 
might.

All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



From:   Madhan Neethiraj <ma...@apache.org>
To:     David Radley <da...@uk.ibm.com>, Sarath Subramanian 
<sa...@apache.org>
Cc:     atlas <de...@atlas.incubator.apache.org>
Date:   13/01/2018 02:14
Subject:        Re: Tag propagation



David,

 

Sarath was working on tag-propagation, but had to take up tasks related to 
JanusGraph and others. He will be resuming tag-propagation work next week; 
this feature would be part of Atlas-1.0.0 release.

 

- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this. 
Agree.

 

- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
Perhaps we should stop the propagation at the entity where the 
classification is not applicable? I think it wouldn’t be correct to block 
a classification association to an entity if the classification is not 
applicable for a down-stream entity.

 

- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
I was thinking about a separate attribute, 
AtlasEntity.propagatedClassifications, for this. However, I think your 
suggestion of adding a field to AtlasClassification is a better one; with 
this approach no changes would be needed in applications that process 
classifications on an entity. How about we capture the guid of the source 
entity on which the classification is associated, 
AtlasClassification.sourceEntityGuid? If this value is null, then the 
classification is associated with the current entity directly.

 

- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
Yes. With the approach detailed above, no changes would be needed in 
Ranger.

 

- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea. 
To  enable queries like ‘get list of entities that are classified as PII’, 
it will be performant if each entity vertex has data about the propagated 
classifications as well, similar to entities having data on 
classifications directly associated with the entity currently. However, 
all the entities should directly reference a single instance of a 
classification, so that it will be easier to manage changes to 
classification attribute values. Sarath will send an update on the design 
choices later next week.

 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm". 
Yes. This usecase should be covered by the design discussed above.

 

Thanks,

Madhan

 

From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation

 

Hi Madhan, 
I have a look in the code - I was surprised that the tag propagation was 
not in. Is this something you are looking at in the near future? If not I 
may need to look into it. I suggest the tag propagation implementation 
should phase 1 should: 
- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this. 
- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea. 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm". 

What do you think?   all the best, David. 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by David Radley <da...@uk.ibm.com>.

Hi Mandy,
I think you use cases make sense.

For the first use case, I am not sure what the confidential classification 
is here - is it a classification that is shipped with the open types? I 
assume that confidentiality would be a classification that has an ordered 
set of enumerated values, like "no classification", "internal use", 
"confidential". In this case if a NoteEntry and a NoteLog had the 
confidentiality classification on but with different values - we would 
need to design for what happens;having BOTH on the Attached NoteLogEntry 
RelationshipDef does not seem sufficient. Maybe we have an implied 
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern 
- how do we determine what we should do when there are different levels of 
retention or criticality specified on each entity. 

I am also concerned for confidentiality, retention and criticality, I 
assume these classifications would be defined as being applicable to 
Referenceable or to any entitytype. I am not sure on which 
RelationshipDefs these would flow on, but there is a risk that they could 
inadvertently propagate more widely that we would like. I think it would 
be useful to understand all the open metadata tag proposed RelationshipDef 
tag propagations to know these use cases are reasonably addressed. I 
suspect we will want to associate classifications with relationshipDefs so 
that relationshipDefs can limit which classifications they propagate. 
There is also the idea that we may want to override the classifications 
that have been propagated on an individual entity. 

I suggest we need additional mechanisms in addition to BOTH PropagateTags 
on a relationshipdef for your use cases. 

  all the best, David. 





From:   Mandy Chessell <ma...@uk.ibm.com>
To:     dev@atlas.apache.org
Cc:     "Madhan Neethiraj" <ma...@apache.org>, "Sarath Subramanian" 
<sa...@apache.org>
Date:   15/01/2018 11:12
Subject:        Re: Tag propagation



Hello David,

I am not sure how many examples you need.  But here are a couple of 

patterns ...



When we have a cluster of entities that make up a logical collection of 

information - such as a NoteLog and its Notes nested inside (area 1) - and 


a classification applied to any one element needs to be propagated both up 


and down.  For example, making a note log confidential makes all the notes 


inside confidential and making any note confidential makes the note log 

confidential (but not all of the other notes inside - if the confidential 

note is deleted then the note log is no longer confidential).  We will see 


similar behaviours with the dependency relationships between nested 

locations in area 0.



A second example is where the relationship is showing physical 

dependencies between entities that need to be respected.  For example, the 


relationship between DataSet and DataStore (Area 2).   If a data set has a 


retention classification or criticality classification (area 4) then it 

needs to flow to underlying data stores.  If the underlying data stores 

have a confidence classifications then they should propagate to the 

DataSets.  We will see similar behaviours with the dependency 

relationships between server capabilities in area 0.



Make sense?



All the best

Mandy

___________________________________________

Mandy Chessell CBE FREng CEng FBCS

IBM Distinguished Engineer



Master Inventor

Member of the IBM Academy of Technology

Visiting Professor, Department of Computer Science, University of 

Sheffield



Email: mandy_chessell@uk.ibm.com

LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=




Assistant: Janet Brooks - jsbrooks12@uk.ibm.com







From:   David Radley <da...@uk.ibm.com>

To:     Mandy Chessell <ma...@uk.ibm.com>

Cc:     dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>, 

"Sarath Subramanian" <sa...@apache.org>

Date:   15/01/2018 10:05

Subject:        Re: Tag propagation







Hi Mandy,



From what I recall, we discussed some scenarios that we felt Tag 



propagation would be useful. I think the use cases we are thinking of are 



now indicated by the model files that have "propagateTags" set. The 



examples include the semanticClassification and the 



"hbase_table_column_families" relationships. We had not identified any use 






cases we felt were important where BOTH would be useful for a 



relationship; so were thinking of removing that option. Do you have some 



relationships that require BOTH in the open types - it would be useful for 






me to understand why those relationships need BOTH, 



         many thanks , David. 











From:   Mandy Chessell/UK/IBM



To:     dev@atlas.apache.org



Cc:     David Radley <da...@uk.ibm.com>, atlas 



<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>



Date:   14/01/2018 13:25



Subject:        Re: Tag propagation











Hello Madhan, David,



I would not wish to remove the option to have tag propagation flow in both 






directions.  Most metadata relationships are not hierarchical.  They are 



two-way and different situations will cause for different classifications 



to flow in each direction.  I do not remember the discussion on removing 



the BOTH open - but if I missed it I apologise.  What is the 



justification?







The enforcement of the classification's entity types should not prevent 



the propagation of the tag through an entity because it does not support a 






tag.  Down stream entities may support the tag and need it to be 



propagated to them.  We need to work through more scenarios because we 



also need a way to bound tag propagation :)







As an FYI, the OMRS API for classifications includes an origin attribute 



that lets us return classifications with an entity that are explicitly 



assigned or propagated to the entity.  Most callers will not care but some 






might.







All the best



Mandy



___________________________________________



Mandy Chessell CBE FREng CEng FBCS



IBM Distinguished Engineer







Master Inventor



Member of the IBM Academy of Technology



Visiting Professor, Department of Computer Science, University of 



Sheffield







Email: mandy_chessell@uk.ibm.com



LinkedIn: 

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=










Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



















From:   Madhan Neethiraj <ma...@apache.org>



To:     David Radley <da...@uk.ibm.com>, Sarath Subramanian 



<sa...@apache.org>



Cc:     atlas <de...@atlas.incubator.apache.org>



Date:   13/01/2018 02:14



Subject:        Re: Tag propagation















David,







 







Sarath was working on tag-propagation, but had to take up tasks related to 






JanusGraph and others. He will be resuming tag-propagation work next week; 






this feature would be part of Atlas-1.0.0 release.







 







- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



Agree.







 







- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



Perhaps we should stop the propagation at the entity where the 



classification is not applicable? I think it wouldn’t be correct to block 



a classification association to an entity if the classification is not 



applicable for a down-stream entity.







 







- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



I was thinking about a separate attribute, 



AtlasEntity.propagatedClassifications, for this. However, I think your 



suggestion of adding a field to AtlasClassification is a better one; with 



this approach no changes would be needed in applications that process 



classifications on an entity. How about we capture the guid of the source 



entity on which the classification is associated, 



AtlasClassification.sourceEntityGuid? If this value is null, then the 



classification is associated with the current entity directly.







 







- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



Yes. With the approach detailed above, no changes would be needed in 



Ranger.







 







- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 



To  enable queries like ‘get list of entities that are classified as PII’, 






it will be performant if each entity vertex has data about the propagated 



classifications as well, similar to entities having data on 



classifications directly associated with the entity currently. However, 



all the entities should directly reference a single instance of a 



classification, so that it will be easier to manage changes to 



classification attribute values. Sarath will send an update on the design 



choices later next week.







 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 



Yes. This usecase should be covered by the design discussed above.







 







Thanks,







Madhan







 







From: David Radley <da...@uk.ibm.com>



Date: Thursday, January 11, 2018 at 8:52 AM



To: Madhan Neethiraj <mn...@hortonworks.com>



Cc: atlas <de...@atlas.incubator.apache.org>



Subject: Tag propagation







 







Hi Madhan, 



I have a look in the code - I was surprised that the tag propagation was 



not in. Is this something you are looking at in the near future? If not I 



may need to look into it. I suggest the tag propagation implementation 



should phase 1 should: 



- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 







What do you think?   all the best, David. 







Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



























Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
















Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by Mandy Chessell <ma...@uk.ibm.com>.

Hello David,
There is only on instance of a classification allowed on an entity.  A 
propagated classification can not override an explicitly set 
classification.  When it comes to managing conflicts, there is nothing 
special about propagated classifications.  A new entity, or a new 
classification to an entity, or a new relationship needs to be validated 
and if it is invalid then the update is rejected.  Because the model is 
distributed, then it is possible that updates in different servers may 
conflict and be discovered later as we synchronise metadata between 
members of the cohort.  These conflicts are reported through the OMRS 
Event Protocol and corrected though exception management processes. 

In the example of the note log, and assuming we are using the 
confidentiality classification defined in area 4 which has a sliding scale 
of enums as you state, and the Notelog has an explicit classification of 
"internal use" then it would be invalid to add a note that has a higher 
value of the classification because the note log's classification is the 
high water mark for the note log.   So the request to add the confidential 
note would be rejected.  If the note log did not have any confidentiality 
classification then the confidential note could be added and 
classification propagation up the hierarchy would be in effect making the 
note log confidential.

The classifications of confidentiality, retention and criticality are 
defined as valid for entities that inherit from Referenceable.  This is 
not a recent change - see model 422.  I agree we need to systematically 
work through the scenarios.  That was the point of my original note on 
this topic.  The BOTH option was being removed based on thinking through 
only 2 use cases that were not representational of the governance 
requirements.   I came up with 2 counter-examples in a few minutes and I 
am sure there are more.  I have not found a case yet where the existing 
configuration does not work - but I am not confident I have been through 
all of the scenarios either. 

This function needs a proper design and community review to get it right.  


All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



From:   David Radley/UK/IBM
To:     Mandy Chessell/UK/IBM@IBMGB
Cc:     dev@atlas.apache.org
Date:   15/01/2018 11:49
Subject:        Re: Tag propagation


Hi Mandy,
I think you use cases make sense.

For the first use case, I am not sure what the confidential classification 
is here - is it a classification that is shipped with the open types? I 
assume that confidentiality would be a classification that has an ordered 
set of enumerated values, like "no classification", "internal use", 
"confidential". In this case if a NoteEntry and a NoteLog had the 
confidentiality classification on but with different values - we would 
need to design for what happens;having BOTH on the Attached NoteLogEntry 
RelationshipDef does not seem sufficient. Maybe we have an implied 
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern 
- how do we determine what we should do when there are different levels of 
retention or criticality specified on each entity. 

I am also concerned for confidentiality, retention and criticality, I 
assume these classifications would be defined as being applicable to 
Referenceable or to any entitytype. I am not sure on which 
RelationshipDefs these would flow on, but there is a risk that they could 
inadvertently propagate more widely that we would like. I think it would 
be useful to understand all the open metadata tag proposed RelationshipDef 
tag propagations to know these use cases are reasonably addressed. I 
suspect we will want to associate classifications with relationshipDefs so 
that relationshipDefs can limit which classifications they propagate. 
There is also the idea that we may want to override the classifications 
that have been propagated on an individual entity. 

I suggest we need additional mechanisms in addition to BOTH PropagateTags 
on a relationshipdef for your use cases. 

  all the best, David. 






From:   Mandy Chessell <ma...@uk.ibm.com>
To:     dev@atlas.apache.org
Cc:     "Madhan Neethiraj" <ma...@apache.org>, "Sarath Subramanian" 
<sa...@apache.org>
Date:   15/01/2018 11:12
Subject:        Re: Tag propagation



Hello David,

I am not sure how many examples you need.  But here are a couple of 

patterns ...



When we have a cluster of entities that make up a logical collection of 

information - such as a NoteLog and its Notes nested inside (area 1) - and 


a classification applied to any one element needs to be propagated both up 


and down.  For example, making a note log confidential makes all the notes 


inside confidential and making any note confidential makes the note log 

confidential (but not all of the other notes inside - if the confidential 

note is deleted then the note log is no longer confidential).  We will see 


similar behaviours with the dependency relationships between nested 

locations in area 0.



A second example is where the relationship is showing physical 

dependencies between entities that need to be respected.  For example, the 


relationship between DataSet and DataStore (Area 2).   If a data set has a 


retention classification or criticality classification (area 4) then it 

needs to flow to underlying data stores.  If the underlying data stores 

have a confidence classifications then they should propagate to the 

DataSets.  We will see similar behaviours with the dependency 

relationships between server capabilities in area 0.



Make sense?



All the best

Mandy

___________________________________________

Mandy Chessell CBE FREng CEng FBCS

IBM Distinguished Engineer



Master Inventor

Member of the IBM Academy of Technology

Visiting Professor, Department of Computer Science, University of 

Sheffield



Email: mandy_chessell@uk.ibm.com

LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=




Assistant: Janet Brooks - jsbrooks12@uk.ibm.com







From:   David Radley <da...@uk.ibm.com>

To:     Mandy Chessell <ma...@uk.ibm.com>

Cc:     dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>, 

"Sarath Subramanian" <sa...@apache.org>

Date:   15/01/2018 10:05

Subject:        Re: Tag propagation







Hi Mandy,



From what I recall, we discussed some scenarios that we felt Tag 



propagation would be useful. I think the use cases we are thinking of are 



now indicated by the model files that have "propagateTags" set. The 



examples include the semanticClassification and the 



"hbase_table_column_families" relationships. We had not identified any use 






cases we felt were important where BOTH would be useful for a 



relationship; so were thinking of removing that option. Do you have some 



relationships that require BOTH in the open types - it would be useful for 






me to understand why those relationships need BOTH, 



         many thanks , David. 











From:   Mandy Chessell/UK/IBM



To:     dev@atlas.apache.org



Cc:     David Radley <da...@uk.ibm.com>, atlas 



<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>



Date:   14/01/2018 13:25



Subject:        Re: Tag propagation











Hello Madhan, David,



I would not wish to remove the option to have tag propagation flow in both 






directions.  Most metadata relationships are not hierarchical.  They are 



two-way and different situations will cause for different classifications 



to flow in each direction.  I do not remember the discussion on removing 



the BOTH open - but if I missed it I apologise.  What is the 



justification?







The enforcement of the classification's entity types should not prevent 



the propagation of the tag through an entity because it does not support a 






tag.  Down stream entities may support the tag and need it to be 



propagated to them.  We need to work through more scenarios because we 



also need a way to bound tag propagation :)







As an FYI, the OMRS API for classifications includes an origin attribute 



that lets us return classifications with an entity that are explicitly 



assigned or propagated to the entity.  Most callers will not care but some 






might.







All the best



Mandy



___________________________________________



Mandy Chessell CBE FREng CEng FBCS



IBM Distinguished Engineer







Master Inventor



Member of the IBM Academy of Technology



Visiting Professor, Department of Computer Science, University of 



Sheffield







Email: mandy_chessell@uk.ibm.com



LinkedIn: 

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=










Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



















From:   Madhan Neethiraj <ma...@apache.org>



To:     David Radley <da...@uk.ibm.com>, Sarath Subramanian 



<sa...@apache.org>



Cc:     atlas <de...@atlas.incubator.apache.org>



Date:   13/01/2018 02:14



Subject:        Re: Tag propagation















David,







 







Sarath was working on tag-propagation, but had to take up tasks related to 






JanusGraph and others. He will be resuming tag-propagation work next week; 






this feature would be part of Atlas-1.0.0 release.







 







- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



Agree.







 







- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



Perhaps we should stop the propagation at the entity where the 



classification is not applicable? I think it wouldn’t be correct to block 



a classification association to an entity if the classification is not 



applicable for a down-stream entity.







 







- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



I was thinking about a separate attribute, 



AtlasEntity.propagatedClassifications, for this. However, I think your 



suggestion of adding a field to AtlasClassification is a better one; with 



this approach no changes would be needed in applications that process 



classifications on an entity. How about we capture the guid of the source 



entity on which the classification is associated, 



AtlasClassification.sourceEntityGuid? If this value is null, then the 



classification is associated with the current entity directly.







 







- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



Yes. With the approach detailed above, no changes would be needed in 



Ranger.







 







- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 



To  enable queries like ‘get list of entities that are classified as PII’, 






it will be performant if each entity vertex has data about the propagated 



classifications as well, similar to entities having data on 



classifications directly associated with the entity currently. However, 



all the entities should directly reference a single instance of a 



classification, so that it will be easier to manage changes to 



classification attribute values. Sarath will send an update on the design 



choices later next week.







 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 



Yes. This usecase should be covered by the design discussed above.







 







Thanks,







Madhan







 







From: David Radley <da...@uk.ibm.com>



Date: Thursday, January 11, 2018 at 8:52 AM



To: Madhan Neethiraj <mn...@hortonworks.com>



Cc: atlas <de...@atlas.incubator.apache.org>



Subject: Tag propagation







 







Hi Madhan, 



I have a look in the code - I was surprised that the tag propagation was 



not in. Is this something you are looking at in the near future? If not I 



may need to look into it. I suggest the tag propagation implementation 



should phase 1 should: 



- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 







What do you think?   all the best, David. 







Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



























Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by David Radley <da...@uk.ibm.com>.

Hi Mandy,
Thanks for the extra detail, I can see the need to keep BOTH now. I like 
your proposal on how to resolve conflicts - I had not seen this; I assume 
removing a classification from an entity could enable tags to propagate to 
it. I suggest including the proposed tag propagation values on the 
relationshipDefs in the wiki, 

    all the best, David. 



From:   Mandy Chessell/UK/IBM
To:     David Radley/UK/IBM@IBMGB
Cc:     dev@atlas.apache.org
Date:   15/01/2018 12:25
Subject:        Re: Tag propagation


Hello David,
There is only on instance of a classification allowed on an entity.  A 
propagated classification can not override an explicitly set 
classification.  When it comes to managing conflicts, there is nothing 
special about propagated classifications.  A new entity, or a new 
classification to an entity, or a new relationship needs to be validated 
and if it is invalid then the update is rejected.  Because the model is 
distributed, then it is possible that updates in different servers may 
conflict and be discovered later as we synchronise metadata between 
members of the cohort.  These conflicts are reported through the OMRS 
Event Protocol and corrected though exception management processes. 

In the example of the note log, and assuming we are using the 
confidentiality classification defined in area 4 which has a sliding scale 
of enums as you state, and the Notelog has an explicit classification of 
"internal use" then it would be invalid to add a note that has a higher 
value of the classification because the note log's classification is the 
high water mark for the note log.   So the request to add the confidential 
note would be rejected.  If the note log did not have any confidentiality 
classification then the confidential note could be added and 
classification propagation up the hierarchy would be in effect making the 
note log confidential.

The classifications of confidentiality, retention and criticality are 
defined as valid for entities that inherit from Referenceable.  This is 
not a recent change - see model 422.  I agree we need to systematically 
work through the scenarios.  That was the point of my original note on 
this topic.  The BOTH option was being removed based on thinking through 
only 2 use cases that were not representational of the governance 
requirements.   I came up with 2 counter-examples in a few minutes and I 
am sure there are more.  I have not found a case yet where the existing 
configuration does not work - but I am not confident I have been through 
all of the scenarios either. 

This function needs a proper design and community review to get it right.  


All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com




From:   David Radley/UK/IBM
To:     Mandy Chessell/UK/IBM@IBMGB
Cc:     dev@atlas.apache.org
Date:   15/01/2018 11:49
Subject:        Re: Tag propagation


Hi Mandy,
I think you use cases make sense.

For the first use case, I am not sure what the confidential classification 
is here - is it a classification that is shipped with the open types? I 
assume that confidentiality would be a classification that has an ordered 
set of enumerated values, like "no classification", "internal use", 
"confidential". In this case if a NoteEntry and a NoteLog had the 
confidentiality classification on but with different values - we would 
need to design for what happens;having BOTH on the Attached NoteLogEntry 
RelationshipDef does not seem sufficient. Maybe we have an implied 
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern 
- how do we determine what we should do when there are different levels of 
retention or criticality specified on each entity. 

I am also concerned for confidentiality, retention and criticality, I 
assume these classifications would be defined as being applicable to 
Referenceable or to any entitytype. I am not sure on which 
RelationshipDefs these would flow on, but there is a risk that they could 
inadvertently propagate more widely that we would like. I think it would 
be useful to understand all the open metadata tag proposed RelationshipDef 
tag propagations to know these use cases are reasonably addressed. I 
suspect we will want to associate classifications with relationshipDefs so 
that relationshipDefs can limit which classifications they propagate. 
There is also the idea that we may want to override the classifications 
that have been propagated on an individual entity. 

I suggest we need additional mechanisms in addition to BOTH PropagateTags 
on a relationshipdef for your use cases. 

  all the best, David. 






From:   Mandy Chessell <ma...@uk.ibm.com>
To:     dev@atlas.apache.org
Cc:     "Madhan Neethiraj" <ma...@apache.org>, "Sarath Subramanian" 
<sa...@apache.org>
Date:   15/01/2018 11:12
Subject:        Re: Tag propagation



Hello David,

I am not sure how many examples you need.  But here are a couple of 

patterns ...



When we have a cluster of entities that make up a logical collection of 

information - such as a NoteLog and its Notes nested inside (area 1) - and 


a classification applied to any one element needs to be propagated both up 


and down.  For example, making a note log confidential makes all the notes 


inside confidential and making any note confidential makes the note log 

confidential (but not all of the other notes inside - if the confidential 

note is deleted then the note log is no longer confidential).  We will see 


similar behaviours with the dependency relationships between nested 

locations in area 0.



A second example is where the relationship is showing physical 

dependencies between entities that need to be respected.  For example, the 


relationship between DataSet and DataStore (Area 2).   If a data set has a 


retention classification or criticality classification (area 4) then it 

needs to flow to underlying data stores.  If the underlying data stores 

have a confidence classifications then they should propagate to the 

DataSets.  We will see similar behaviours with the dependency 

relationships between server capabilities in area 0.



Make sense?



All the best

Mandy

___________________________________________

Mandy Chessell CBE FREng CEng FBCS

IBM Distinguished Engineer



Master Inventor

Member of the IBM Academy of Technology

Visiting Professor, Department of Computer Science, University of 

Sheffield



Email: mandy_chessell@uk.ibm.com

LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=




Assistant: Janet Brooks - jsbrooks12@uk.ibm.com







From:   David Radley <da...@uk.ibm.com>

To:     Mandy Chessell <ma...@uk.ibm.com>

Cc:     dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>, 

"Sarath Subramanian" <sa...@apache.org>

Date:   15/01/2018 10:05

Subject:        Re: Tag propagation







Hi Mandy,



From what I recall, we discussed some scenarios that we felt Tag 



propagation would be useful. I think the use cases we are thinking of are 



now indicated by the model files that have "propagateTags" set. The 



examples include the semanticClassification and the 



"hbase_table_column_families" relationships. We had not identified any use 






cases we felt were important where BOTH would be useful for a 



relationship; so were thinking of removing that option. Do you have some 



relationships that require BOTH in the open types - it would be useful for 






me to understand why those relationships need BOTH, 



         many thanks , David. 











From:   Mandy Chessell/UK/IBM



To:     dev@atlas.apache.org



Cc:     David Radley <da...@uk.ibm.com>, atlas 



<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>



Date:   14/01/2018 13:25



Subject:        Re: Tag propagation











Hello Madhan, David,



I would not wish to remove the option to have tag propagation flow in both 






directions.  Most metadata relationships are not hierarchical.  They are 



two-way and different situations will cause for different classifications 



to flow in each direction.  I do not remember the discussion on removing 



the BOTH open - but if I missed it I apologise.  What is the 



justification?







The enforcement of the classification's entity types should not prevent 



the propagation of the tag through an entity because it does not support a 






tag.  Down stream entities may support the tag and need it to be 



propagated to them.  We need to work through more scenarios because we 



also need a way to bound tag propagation :)







As an FYI, the OMRS API for classifications includes an origin attribute 



that lets us return classifications with an entity that are explicitly 



assigned or propagated to the entity.  Most callers will not care but some 






might.







All the best



Mandy



___________________________________________



Mandy Chessell CBE FREng CEng FBCS



IBM Distinguished Engineer







Master Inventor



Member of the IBM Academy of Technology



Visiting Professor, Department of Computer Science, University of 



Sheffield







Email: mandy_chessell@uk.ibm.com



LinkedIn: 

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=










Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



















From:   Madhan Neethiraj <ma...@apache.org>



To:     David Radley <da...@uk.ibm.com>, Sarath Subramanian 



<sa...@apache.org>



Cc:     atlas <de...@atlas.incubator.apache.org>



Date:   13/01/2018 02:14



Subject:        Re: Tag propagation















David,







 







Sarath was working on tag-propagation, but had to take up tasks related to 






JanusGraph and others. He will be resuming tag-propagation work next week; 






this feature would be part of Atlas-1.0.0 release.







 







- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



Agree.







 







- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



Perhaps we should stop the propagation at the entity where the 



classification is not applicable? I think it wouldn’t be correct to block 



a classification association to an entity if the classification is not 



applicable for a down-stream entity.







 







- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



I was thinking about a separate attribute, 



AtlasEntity.propagatedClassifications, for this. However, I think your 



suggestion of adding a field to AtlasClassification is a better one; with 



this approach no changes would be needed in applications that process 



classifications on an entity. How about we capture the guid of the source 



entity on which the classification is associated, 



AtlasClassification.sourceEntityGuid? If this value is null, then the 



classification is associated with the current entity directly.







 







- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



Yes. With the approach detailed above, no changes would be needed in 



Ranger.







 







- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 



To  enable queries like ‘get list of entities that are classified as PII’, 






it will be performant if each entity vertex has data about the propagated 



classifications as well, similar to entities having data on 



classifications directly associated with the entity currently. However, 



all the entities should directly reference a single instance of a 



classification, so that it will be easier to manage changes to 



classification attribute values. Sarath will send an update on the design 



choices later next week.







 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 



Yes. This usecase should be covered by the design discussed above.







 







Thanks,







Madhan







 







From: David Radley <da...@uk.ibm.com>



Date: Thursday, January 11, 2018 at 8:52 AM



To: Madhan Neethiraj <mn...@hortonworks.com>



Cc: atlas <de...@atlas.incubator.apache.org>



Subject: Tag propagation







 







Hi Madhan, 



I have a look in the code - I was surprised that the tag propagation was 



not in. Is this something you are looking at in the near future? If not I 



may need to look into it. I suggest the tag propagation implementation 



should phase 1 should: 



- lose BOTH - this is still in the code - I think we agreed we wanted to 



get rid of this. 



- should honour the classification entitytypes - so that we do not get 



classifications applied to inappropriate entityTypes 



- There is the question about how the propagated classifications would 



look in the get entity rest API  - I suggest that they appear in the 



entities classification with a field indicating that they are derived (and 






hence not able to be removed by an entity update). 



- I would hope that Ranger would pick up these new propagated tags using 



the existing tag sync. 



- I think you wanted the derived classifications to be picked up at query 



time. I also remember suggesting that we store the derived classifications 






in a derivedClassifiation property in the entity which would contain the 



list of derived classifications. Or we could store them as a new type of 



edge "propagated classification" edges to the real classification. I like 



the edge idea. 







If we had the above, we could classify a Term as PSI, and use the semantic 






mapping to propagate the classifications to the hive column. The hive 



column would not pick up classifications defined in the area 3 model like 



"SpineObject", which is defined as only applying to "GlossaryTerm". 







What do you think?   all the best, David. 







Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



























Unless stated otherwise above:



IBM United Kingdom Limited - Registered in England and Wales with number 



741598. 



Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


















Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by Mandy Chessell <ma...@uk.ibm.com>.

Hello David,
I am not sure how many examples you need.  But here are a couple of 
patterns ...

When we have a cluster of entities that make up a logical collection of 
information - such as a NoteLog and its Notes nested inside (area 1) - and 
a classification applied to any one element needs to be propagated both up 
and down.  For example, making a note log confidential makes all the notes 
inside confidential and making any note confidential makes the note log 
confidential (but not all of the other notes inside - if the confidential 
note is deleted then the note log is no longer confidential).  We will see 
similar behaviours with the dependency relationships between nested 
locations in area 0.

A second example is where the relationship is showing physical 
dependencies between entities that need to be respected.  For example, the 
relationship between DataSet and DataStore (Area 2).   If a data set has a 
retention classification or criticality classification (area 4) then it 
needs to flow to underlying data stores.  If the underlying data stores 
have a confidence classifications then they should propagate to the 
DataSets.  We will see similar behaviours with the dependency 
relationships between server capabilities in area 0.

Make sense?

All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com



From:   David Radley <da...@uk.ibm.com>
To:     Mandy Chessell <ma...@uk.ibm.com>
Cc:     dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>, 
"Sarath Subramanian" <sa...@apache.org>
Date:   15/01/2018 10:05
Subject:        Re: Tag propagation



Hi Mandy,

From what I recall, we discussed some scenarios that we felt Tag 

propagation would be useful. I think the use cases we are thinking of are 

now indicated by the model files that have "propagateTags" set. The 

examples include the semanticClassification and the 

"hbase_table_column_families" relationships. We had not identified any use 


cases we felt were important where BOTH would be useful for a 

relationship; so were thinking of removing that option. Do you have some 

relationships that require BOTH in the open types - it would be useful for 


me to understand why those relationships need BOTH, 

         many thanks , David. 





From:   Mandy Chessell/UK/IBM

To:     dev@atlas.apache.org

Cc:     David Radley <da...@uk.ibm.com>, atlas 

<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>

Date:   14/01/2018 13:25

Subject:        Re: Tag propagation





Hello Madhan, David,

I would not wish to remove the option to have tag propagation flow in both 


directions.  Most metadata relationships are not hierarchical.  They are 

two-way and different situations will cause for different classifications 

to flow in each direction.  I do not remember the discussion on removing 

the BOTH open - but if I missed it I apologise.  What is the 

justification?



The enforcement of the classification's entity types should not prevent 

the propagation of the tag through an entity because it does not support a 


tag.  Down stream entities may support the tag and need it to be 

propagated to them.  We need to work through more scenarios because we 

also need a way to bound tag propagation :)



As an FYI, the OMRS API for classifications includes an origin attribute 

that lets us return classifications with an entity that are explicitly 

assigned or propagated to the entity.  Most callers will not care but some 


might.



All the best

Mandy

___________________________________________

Mandy Chessell CBE FREng CEng FBCS

IBM Distinguished Engineer



Master Inventor

Member of the IBM Academy of Technology

Visiting Professor, Department of Computer Science, University of 

Sheffield



Email: mandy_chessell@uk.ibm.com

LinkedIn: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=




Assistant: Janet Brooks - jsbrooks12@uk.ibm.com









From:   Madhan Neethiraj <ma...@apache.org>

To:     David Radley <da...@uk.ibm.com>, Sarath Subramanian 

<sa...@apache.org>

Cc:     atlas <de...@atlas.incubator.apache.org>

Date:   13/01/2018 02:14

Subject:        Re: Tag propagation







David,



 



Sarath was working on tag-propagation, but had to take up tasks related to 


JanusGraph and others. He will be resuming tag-propagation work next week; 


this feature would be part of Atlas-1.0.0 release.



 



- lose BOTH - this is still in the code - I think we agreed we wanted to 

get rid of this. 

Agree.



 



- should honour the classification entitytypes - so that we do not get 

classifications applied to inappropriate entityTypes 

Perhaps we should stop the propagation at the entity where the 

classification is not applicable? I think it wouldn’t be correct to block 

a classification association to an entity if the classification is not 

applicable for a down-stream entity.



 



- There is the question about how the propagated classifications would 

look in the get entity rest API  - I suggest that they appear in the 

entities classification with a field indicating that they are derived (and 


hence not able to be removed by an entity update). 

I was thinking about a separate attribute, 

AtlasEntity.propagatedClassifications, for this. However, I think your 

suggestion of adding a field to AtlasClassification is a better one; with 

this approach no changes would be needed in applications that process 

classifications on an entity. How about we capture the guid of the source 

entity on which the classification is associated, 

AtlasClassification.sourceEntityGuid? If this value is null, then the 

classification is associated with the current entity directly.



 



- I would hope that Ranger would pick up these new propagated tags using 

the existing tag sync. 

Yes. With the approach detailed above, no changes would be needed in 

Ranger.



 



- I think you wanted the derived classifications to be picked up at query 

time. I also remember suggesting that we store the derived classifications 


in a derivedClassifiation property in the entity which would contain the 

list of derived classifications. Or we could store them as a new type of 

edge "propagated classification" edges to the real classification. I like 

the edge idea. 

To  enable queries like ‘get list of entities that are classified as PII’, 


it will be performant if each entity vertex has data about the propagated 

classifications as well, similar to entities having data on 

classifications directly associated with the entity currently. However, 

all the entities should directly reference a single instance of a 

classification, so that it will be easier to manage changes to 

classification attribute values. Sarath will send an update on the design 

choices later next week.



 



If we had the above, we could classify a Term as PSI, and use the semantic 


mapping to propagate the classifications to the hive column. The hive 

column would not pick up classifications defined in the area 3 model like 

"SpineObject", which is defined as only applying to "GlossaryTerm". 

Yes. This usecase should be covered by the design discussed above.



 



Thanks,



Madhan



 



From: David Radley <da...@uk.ibm.com>

Date: Thursday, January 11, 2018 at 8:52 AM

To: Madhan Neethiraj <mn...@hortonworks.com>

Cc: atlas <de...@atlas.incubator.apache.org>

Subject: Tag propagation



 



Hi Madhan, 

I have a look in the code - I was surprised that the tag propagation was 

not in. Is this something you are looking at in the near future? If not I 

may need to look into it. I suggest the tag propagation implementation 

should phase 1 should: 

- lose BOTH - this is still in the code - I think we agreed we wanted to 

get rid of this. 

- should honour the classification entitytypes - so that we do not get 

classifications applied to inappropriate entityTypes 

- There is the question about how the propagated classifications would 

look in the get entity rest API  - I suggest that they appear in the 

entities classification with a field indicating that they are derived (and 


hence not able to be removed by an entity update). 

- I would hope that Ranger would pick up these new propagated tags using 

the existing tag sync. 

- I think you wanted the derived classifications to be picked up at query 

time. I also remember suggesting that we store the derived classifications 


in a derivedClassifiation property in the entity which would contain the 

list of derived classifications. Or we could store them as a new type of 

edge "propagated classification" edges to the real classification. I like 

the edge idea. 



If we had the above, we could classify a Term as PSI, and use the semantic 


mapping to propagate the classifications to the hive column. The hive 

column would not pick up classifications defined in the area 3 model like 

"SpineObject", which is defined as only applying to "GlossaryTerm". 



What do you think?   all the best, David. 



Unless stated otherwise above:

IBM United Kingdom Limited - Registered in England and Wales with number 

741598. 

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU













Unless stated otherwise above:

IBM United Kingdom Limited - Registered in England and Wales with number 

741598. 

Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by David Radley <da...@uk.ibm.com>.

Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag 
propagation would be useful. I think the use cases we are thinking of are 
now indicated by the model files that have "propagateTags" set. The 
examples include the semanticClassification and the 
"hbase_table_column_families" relationships. We had not identified any use 
cases we felt were important where BOTH would be useful for a 
relationship; so were thinking of removing that option. Do you have some 
relationships that require BOTH in the open types - it would be useful for 
me to understand why those relationships need BOTH, 
         many thanks , David. 


From:   Mandy Chessell/UK/IBM
To:     dev@atlas.apache.org
Cc:     David Radley <da...@uk.ibm.com>, atlas 
<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>
Date:   14/01/2018 13:25
Subject:        Re: Tag propagation


Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both 
directions.  Most metadata relationships are not hierarchical.  They are 
two-way and different situations will cause for different classifications 
to flow in each direction.  I do not remember the discussion on removing 
the BOTH open - but if I missed it I apologise.  What is the 
justification?

The enforcement of the classification's entity types should not prevent 
the propagation of the tag through an entity because it does not support a 
tag.  Down stream entities may support the tag and need it to be 
propagated to them.  We need to work through more scenarios because we 
also need a way to bound tag propagation :)

As an FYI, the OMRS API for classifications includes an origin attribute 
that lets us return classifications with an entity that are explicitly 
assigned or propagated to the entity.  Most callers will not care but some 
might.

All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer

Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of 
Sheffield

Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49

Assistant: Janet Brooks - jsbrooks12@uk.ibm.com




From:   Madhan Neethiraj <ma...@apache.org>
To:     David Radley <da...@uk.ibm.com>, Sarath Subramanian 
<sa...@apache.org>
Cc:     atlas <de...@atlas.incubator.apache.org>
Date:   13/01/2018 02:14
Subject:        Re: Tag propagation



David,

 

Sarath was working on tag-propagation, but had to take up tasks related to 
JanusGraph and others. He will be resuming tag-propagation work next week; 
this feature would be part of Atlas-1.0.0 release.

 

- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this. 
Agree.

 

- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
Perhaps we should stop the propagation at the entity where the 
classification is not applicable? I think it wouldn’t be correct to block 
a classification association to an entity if the classification is not 
applicable for a down-stream entity.

 

- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
I was thinking about a separate attribute, 
AtlasEntity.propagatedClassifications, for this. However, I think your 
suggestion of adding a field to AtlasClassification is a better one; with 
this approach no changes would be needed in applications that process 
classifications on an entity. How about we capture the guid of the source 
entity on which the classification is associated, 
AtlasClassification.sourceEntityGuid? If this value is null, then the 
classification is associated with the current entity directly.

 

- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
Yes. With the approach detailed above, no changes would be needed in 
Ranger.

 

- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea. 
To  enable queries like ‘get list of entities that are classified as PII’, 
it will be performant if each entity vertex has data about the propagated 
classifications as well, similar to entities having data on 
classifications directly associated with the entity currently. However, 
all the entities should directly reference a single instance of a 
classification, so that it will be easier to manage changes to 
classification attribute values. Sarath will send an update on the design 
choices later next week.

 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm". 
Yes. This usecase should be covered by the design discussed above.

 

Thanks,

Madhan

 

From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation

 

Hi Madhan, 
I have a look in the code - I was surprised that the tag propagation was 
not in. Is this something you are looking at in the near future? If not I 
may need to look into it. I suggest the tag propagation implementation 
should phase 1 should: 
- lose BOTH - this is still in the code - I think we agreed we wanted to 
get rid of this. 
- should honour the classification entitytypes - so that we do not get 
classifications applied to inappropriate entityTypes 
- There is the question about how the propagated classifications would 
look in the get entity rest API  - I suggest that they appear in the 
entities classification with a field indicating that they are derived (and 
hence not able to be removed by an entity update). 
- I would hope that Ranger would pick up these new propagated tags using 
the existing tag sync. 
- I think you wanted the derived classifications to be picked up at query 
time. I also remember suggesting that we store the derived classifications 
in a derivedClassifiation property in the entity which would contain the 
list of derived classifications. Or we could store them as a new type of 
edge "propagated classification" edges to the real classification. I like 
the edge idea. 

If we had the above, we could classify a Term as PSI, and use the semantic 
mapping to propagate the classifications to the hive column. The hive 
column would not pick up classifications defined in the area 3 model like 
"SpineObject", which is defined as only applying to "GlossaryTerm". 

What do you think?   all the best, David. 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Tag propagation

Posted by Madhan Neethiraj <ma...@apache.org>.

David,

 

Sarath was working on tag-propagation, but had to take up tasks related to JanusGraph and others. He will be resuming tag-propagation work next week; this feature would be part of Atlas-1.0.0 release.

 

- lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this. 
Agree.

 

- should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes   
Perhaps we should stop the propagation at the entity where the classification is not applicable? I think it wouldn’t be correct to block a classification association to an entity if the classification is not applicable for a down-stream entity.

 

- There is the question about how the propagated classifications would look in the get entity rest API  - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update). 
I was thinking about a separate attribute, AtlasEntity.propagatedClassifications, for this. However, I think your suggestion of adding a field to AtlasClassification is a better one; with this approach no changes would be needed in applications that process classifications on an entity. How about we capture the guid of the source entity on which the classification is associated, AtlasClassification.sourceEntityGuid? If this value is null, then the classification is associated with the current entity directly.

 

- I would hope that Ranger would pick up these new propagated tags using the existing tag sync. 
Yes. With the approach detailed above, no changes would be needed in Ranger.

 

- I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea. 
To  enable queries like ‘get list of entities that are classified as PII’, it will be performant if each entity vertex has data about the propagated classifications as well, similar to entities having data on classifications directly associated with the entity currently. However, all the entities should directly reference a single instance of a classification, so that it will be easier to manage changes to classification attribute values. Sarath will send an update on the design choices later next week.

 

If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm".   
Yes. This usecase should be covered by the design discussed above.

 

Thanks,

Madhan

 

From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation

 

Hi Madhan, 
I have a look in the code - I was surprised that the tag propagation was not in. Is this something you are looking at in the near future? If not I may need to look into it. I suggest the tag propagation implementation should phase 1 should: 
- lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this. 
- should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes   
- There is the question about how the propagated classifications would look in the get entity rest API  - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update). 
- I would hope that Ranger would pick up these new propagated tags using the existing tag sync. 
- I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea. 

If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm".   

What do you think?   all the best, David. 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU