You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@atlas.apache.org by David Radley <da...@uk.ibm.com> on 2018/01/11 16:52:19 UTC
Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by Mandy Chessell <ma...@uk.ibm.com>.
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: Madhan Neethiraj <ma...@apache.org>
To: David Radley <da...@uk.ibm.com>, Sarath Subramanian
<sa...@apache.org>
Cc: atlas <de...@atlas.incubator.apache.org>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by David Radley <da...@uk.ibm.com>.
Hi Mandy,
I think you use cases make sense.
For the first use case, I am not sure what the confidential classification
is here - is it a classification that is shipped with the open types? I
assume that confidentiality would be a classification that has an ordered
set of enumerated values, like "no classification", "internal use",
"confidential". In this case if a NoteEntry and a NoteLog had the
confidentiality classification on but with different values - we would
need to design for what happens;having BOTH on the Attached NoteLogEntry
RelationshipDef does not seem sufficient. Maybe we have an implied
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern
- how do we determine what we should do when there are different levels of
retention or criticality specified on each entity.
I am also concerned for confidentiality, retention and criticality, I
assume these classifications would be defined as being applicable to
Referenceable or to any entitytype. I am not sure on which
RelationshipDefs these would flow on, but there is a risk that they could
inadvertently propagate more widely that we would like. I think it would
be useful to understand all the open metadata tag proposed RelationshipDef
tag propagations to know these use cases are reasonably addressed. I
suspect we will want to associate classifications with relationshipDefs so
that relationshipDefs can limit which classifications they propagate.
There is also the idea that we may want to override the classifications
that have been propagated on an individual entity.
I suggest we need additional mechanisms in addition to BOTH PropagateTags
on a relationshipdef for your use cases.
all the best, David.
From: Mandy Chessell <ma...@uk.ibm.com>
To: dev@atlas.apache.org
Cc: "Madhan Neethiraj" <ma...@apache.org>, "Sarath Subramanian"
<sa...@apache.org>
Date: 15/01/2018 11:12
Subject: Re: Tag propagation
Hello David,
I am not sure how many examples you need. But here are a couple of
patterns ...
When we have a cluster of entities that make up a logical collection of
information - such as a NoteLog and its Notes nested inside (area 1) - and
a classification applied to any one element needs to be propagated both up
and down. For example, making a note log confidential makes all the notes
inside confidential and making any note confidential makes the note log
confidential (but not all of the other notes inside - if the confidential
note is deleted then the note log is no longer confidential). We will see
similar behaviours with the dependency relationships between nested
locations in area 0.
A second example is where the relationship is showing physical
dependencies between entities that need to be respected. For example, the
relationship between DataSet and DataStore (Area 2). If a data set has a
retention classification or criticality classification (area 4) then it
needs to flow to underlying data stores. If the underlying data stores
have a confidence classifications then they should propagate to the
DataSets. We will see similar behaviours with the dependency
relationships between server capabilities in area 0.
Make sense?
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: David Radley <da...@uk.ibm.com>
To: Mandy Chessell <ma...@uk.ibm.com>
Cc: dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>,
"Sarath Subramanian" <sa...@apache.org>
Date: 15/01/2018 10:05
Subject: Re: Tag propagation
Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag
propagation would be useful. I think the use cases we are thinking of are
now indicated by the model files that have "propagateTags" set. The
examples include the semanticClassification and the
"hbase_table_column_families" relationships. We had not identified any use
cases we felt were important where BOTH would be useful for a
relationship; so were thinking of removing that option. Do you have some
relationships that require BOTH in the open types - it would be useful for
me to understand why those relationships need BOTH,
many thanks , David.
From: Mandy Chessell/UK/IBM
To: dev@atlas.apache.org
Cc: David Radley <da...@uk.ibm.com>, atlas
<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>
Date: 14/01/2018 13:25
Subject: Re: Tag propagation
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: Madhan Neethiraj <ma...@apache.org>
To: David Radley <da...@uk.ibm.com>, Sarath Subramanian
<sa...@apache.org>
Cc: atlas <de...@atlas.incubator.apache.org>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by Mandy Chessell <ma...@uk.ibm.com>.
Hello David,
There is only on instance of a classification allowed on an entity. A
propagated classification can not override an explicitly set
classification. When it comes to managing conflicts, there is nothing
special about propagated classifications. A new entity, or a new
classification to an entity, or a new relationship needs to be validated
and if it is invalid then the update is rejected. Because the model is
distributed, then it is possible that updates in different servers may
conflict and be discovered later as we synchronise metadata between
members of the cohort. These conflicts are reported through the OMRS
Event Protocol and corrected though exception management processes.
In the example of the note log, and assuming we are using the
confidentiality classification defined in area 4 which has a sliding scale
of enums as you state, and the Notelog has an explicit classification of
"internal use" then it would be invalid to add a note that has a higher
value of the classification because the note log's classification is the
high water mark for the note log. So the request to add the confidential
note would be rejected. If the note log did not have any confidentiality
classification then the confidential note could be added and
classification propagation up the hierarchy would be in effect making the
note log confidential.
The classifications of confidentiality, retention and criticality are
defined as valid for entities that inherit from Referenceable. This is
not a recent change - see model 422. I agree we need to systematically
work through the scenarios. That was the point of my original note on
this topic. The BOTH option was being removed based on thinking through
only 2 use cases that were not representational of the governance
requirements. I came up with 2 counter-examples in a few minutes and I
am sure there are more. I have not found a case yet where the existing
configuration does not work - but I am not confident I have been through
all of the scenarios either.
This function needs a proper design and community review to get it right.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: David Radley/UK/IBM
To: Mandy Chessell/UK/IBM@IBMGB
Cc: dev@atlas.apache.org
Date: 15/01/2018 11:49
Subject: Re: Tag propagation
Hi Mandy,
I think you use cases make sense.
For the first use case, I am not sure what the confidential classification
is here - is it a classification that is shipped with the open types? I
assume that confidentiality would be a classification that has an ordered
set of enumerated values, like "no classification", "internal use",
"confidential". In this case if a NoteEntry and a NoteLog had the
confidentiality classification on but with different values - we would
need to design for what happens;having BOTH on the Attached NoteLogEntry
RelationshipDef does not seem sufficient. Maybe we have an implied
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern
- how do we determine what we should do when there are different levels of
retention or criticality specified on each entity.
I am also concerned for confidentiality, retention and criticality, I
assume these classifications would be defined as being applicable to
Referenceable or to any entitytype. I am not sure on which
RelationshipDefs these would flow on, but there is a risk that they could
inadvertently propagate more widely that we would like. I think it would
be useful to understand all the open metadata tag proposed RelationshipDef
tag propagations to know these use cases are reasonably addressed. I
suspect we will want to associate classifications with relationshipDefs so
that relationshipDefs can limit which classifications they propagate.
There is also the idea that we may want to override the classifications
that have been propagated on an individual entity.
I suggest we need additional mechanisms in addition to BOTH PropagateTags
on a relationshipdef for your use cases.
all the best, David.
From: Mandy Chessell <ma...@uk.ibm.com>
To: dev@atlas.apache.org
Cc: "Madhan Neethiraj" <ma...@apache.org>, "Sarath Subramanian"
<sa...@apache.org>
Date: 15/01/2018 11:12
Subject: Re: Tag propagation
Hello David,
I am not sure how many examples you need. But here are a couple of
patterns ...
When we have a cluster of entities that make up a logical collection of
information - such as a NoteLog and its Notes nested inside (area 1) - and
a classification applied to any one element needs to be propagated both up
and down. For example, making a note log confidential makes all the notes
inside confidential and making any note confidential makes the note log
confidential (but not all of the other notes inside - if the confidential
note is deleted then the note log is no longer confidential). We will see
similar behaviours with the dependency relationships between nested
locations in area 0.
A second example is where the relationship is showing physical
dependencies between entities that need to be respected. For example, the
relationship between DataSet and DataStore (Area 2). If a data set has a
retention classification or criticality classification (area 4) then it
needs to flow to underlying data stores. If the underlying data stores
have a confidence classifications then they should propagate to the
DataSets. We will see similar behaviours with the dependency
relationships between server capabilities in area 0.
Make sense?
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: David Radley <da...@uk.ibm.com>
To: Mandy Chessell <ma...@uk.ibm.com>
Cc: dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>,
"Sarath Subramanian" <sa...@apache.org>
Date: 15/01/2018 10:05
Subject: Re: Tag propagation
Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag
propagation would be useful. I think the use cases we are thinking of are
now indicated by the model files that have "propagateTags" set. The
examples include the semanticClassification and the
"hbase_table_column_families" relationships. We had not identified any use
cases we felt were important where BOTH would be useful for a
relationship; so were thinking of removing that option. Do you have some
relationships that require BOTH in the open types - it would be useful for
me to understand why those relationships need BOTH,
many thanks , David.
From: Mandy Chessell/UK/IBM
To: dev@atlas.apache.org
Cc: David Radley <da...@uk.ibm.com>, atlas
<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>
Date: 14/01/2018 13:25
Subject: Re: Tag propagation
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: Madhan Neethiraj <ma...@apache.org>
To: David Radley <da...@uk.ibm.com>, Sarath Subramanian
<sa...@apache.org>
Cc: atlas <de...@atlas.incubator.apache.org>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by David Radley <da...@uk.ibm.com>.
Hi Mandy,
Thanks for the extra detail, I can see the need to keep BOTH now. I like
your proposal on how to resolve conflicts - I had not seen this; I assume
removing a classification from an entity could enable tags to propagate to
it. I suggest including the proposed tag propagation values on the
relationshipDefs in the wiki,
all the best, David.
From: Mandy Chessell/UK/IBM
To: David Radley/UK/IBM@IBMGB
Cc: dev@atlas.apache.org
Date: 15/01/2018 12:25
Subject: Re: Tag propagation
Hello David,
There is only on instance of a classification allowed on an entity. A
propagated classification can not override an explicitly set
classification. When it comes to managing conflicts, there is nothing
special about propagated classifications. A new entity, or a new
classification to an entity, or a new relationship needs to be validated
and if it is invalid then the update is rejected. Because the model is
distributed, then it is possible that updates in different servers may
conflict and be discovered later as we synchronise metadata between
members of the cohort. These conflicts are reported through the OMRS
Event Protocol and corrected though exception management processes.
In the example of the note log, and assuming we are using the
confidentiality classification defined in area 4 which has a sliding scale
of enums as you state, and the Notelog has an explicit classification of
"internal use" then it would be invalid to add a note that has a higher
value of the classification because the note log's classification is the
high water mark for the note log. So the request to add the confidential
note would be rejected. If the note log did not have any confidentiality
classification then the confidential note could be added and
classification propagation up the hierarchy would be in effect making the
note log confidential.
The classifications of confidentiality, retention and criticality are
defined as valid for entities that inherit from Referenceable. This is
not a recent change - see model 422. I agree we need to systematically
work through the scenarios. That was the point of my original note on
this topic. The BOTH option was being removed based on thinking through
only 2 use cases that were not representational of the governance
requirements. I came up with 2 counter-examples in a few minutes and I
am sure there are more. I have not found a case yet where the existing
configuration does not work - but I am not confident I have been through
all of the scenarios either.
This function needs a proper design and community review to get it right.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: David Radley/UK/IBM
To: Mandy Chessell/UK/IBM@IBMGB
Cc: dev@atlas.apache.org
Date: 15/01/2018 11:49
Subject: Re: Tag propagation
Hi Mandy,
I think you use cases make sense.
For the first use case, I am not sure what the confidential classification
is here - is it a classification that is shipped with the open types? I
assume that confidentiality would be a classification that has an ordered
set of enumerated values, like "no classification", "internal use",
"confidential". In this case if a NoteEntry and a NoteLog had the
confidentiality classification on but with different values - we would
need to design for what happens;having BOTH on the Attached NoteLogEntry
RelationshipDef does not seem sufficient. Maybe we have an implied
escalation based on the enum order.
For the second case around dataset and datastore, I have the same concern
- how do we determine what we should do when there are different levels of
retention or criticality specified on each entity.
I am also concerned for confidentiality, retention and criticality, I
assume these classifications would be defined as being applicable to
Referenceable or to any entitytype. I am not sure on which
RelationshipDefs these would flow on, but there is a risk that they could
inadvertently propagate more widely that we would like. I think it would
be useful to understand all the open metadata tag proposed RelationshipDef
tag propagations to know these use cases are reasonably addressed. I
suspect we will want to associate classifications with relationshipDefs so
that relationshipDefs can limit which classifications they propagate.
There is also the idea that we may want to override the classifications
that have been propagated on an individual entity.
I suggest we need additional mechanisms in addition to BOTH PropagateTags
on a relationshipdef for your use cases.
all the best, David.
From: Mandy Chessell <ma...@uk.ibm.com>
To: dev@atlas.apache.org
Cc: "Madhan Neethiraj" <ma...@apache.org>, "Sarath Subramanian"
<sa...@apache.org>
Date: 15/01/2018 11:12
Subject: Re: Tag propagation
Hello David,
I am not sure how many examples you need. But here are a couple of
patterns ...
When we have a cluster of entities that make up a logical collection of
information - such as a NoteLog and its Notes nested inside (area 1) - and
a classification applied to any one element needs to be propagated both up
and down. For example, making a note log confidential makes all the notes
inside confidential and making any note confidential makes the note log
confidential (but not all of the other notes inside - if the confidential
note is deleted then the note log is no longer confidential). We will see
similar behaviours with the dependency relationships between nested
locations in area 0.
A second example is where the relationship is showing physical
dependencies between entities that need to be respected. For example, the
relationship between DataSet and DataStore (Area 2). If a data set has a
retention classification or criticality classification (area 4) then it
needs to flow to underlying data stores. If the underlying data stores
have a confidence classifications then they should propagate to the
DataSets. We will see similar behaviours with the dependency
relationships between server capabilities in area 0.
Make sense?
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=QhpUQPr5YlG95aAgCvZGStEXHg4hBbSYQ9JkRqR_svY&m=7nnEh29Xf_0tbQKuwQqj6Go9NNtkRhb2FPFwEMZCTtI&s=Z2PUY9QDU8hrSlXgtDVkeEGNomcasSHW48iWg4_voq4&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: David Radley <da...@uk.ibm.com>
To: Mandy Chessell <ma...@uk.ibm.com>
Cc: dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>,
"Sarath Subramanian" <sa...@apache.org>
Date: 15/01/2018 10:05
Subject: Re: Tag propagation
Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag
propagation would be useful. I think the use cases we are thinking of are
now indicated by the model files that have "propagateTags" set. The
examples include the semanticClassification and the
"hbase_table_column_families" relationships. We had not identified any use
cases we felt were important where BOTH would be useful for a
relationship; so were thinking of removing that option. Do you have some
relationships that require BOTH in the open types - it would be useful for
me to understand why those relationships need BOTH,
many thanks , David.
From: Mandy Chessell/UK/IBM
To: dev@atlas.apache.org
Cc: David Radley <da...@uk.ibm.com>, atlas
<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>
Date: 14/01/2018 13:25
Subject: Re: Tag propagation
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: Madhan Neethiraj <ma...@apache.org>
To: David Radley <da...@uk.ibm.com>, Sarath Subramanian
<sa...@apache.org>
Cc: atlas <de...@atlas.incubator.apache.org>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by Mandy Chessell <ma...@uk.ibm.com>.
Hello David,
I am not sure how many examples you need. But here are a couple of
patterns ...
When we have a cluster of entities that make up a logical collection of
information - such as a NoteLog and its Notes nested inside (area 1) - and
a classification applied to any one element needs to be propagated both up
and down. For example, making a note log confidential makes all the notes
inside confidential and making any note confidential makes the note log
confidential (but not all of the other notes inside - if the confidential
note is deleted then the note log is no longer confidential). We will see
similar behaviours with the dependency relationships between nested
locations in area 0.
A second example is where the relationship is showing physical
dependencies between entities that need to be respected. For example, the
relationship between DataSet and DataStore (Area 2). If a data set has a
retention classification or criticality classification (area 4) then it
needs to flow to underlying data stores. If the underlying data stores
have a confidence classifications then they should propagate to the
DataSets. We will see similar behaviours with the dependency
relationships between server capabilities in area 0.
Make sense?
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: David Radley <da...@uk.ibm.com>
To: Mandy Chessell <ma...@uk.ibm.com>
Cc: dev@atlas.apache.org, "Madhan Neethiraj" <ma...@apache.org>,
"Sarath Subramanian" <sa...@apache.org>
Date: 15/01/2018 10:05
Subject: Re: Tag propagation
Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag
propagation would be useful. I think the use cases we are thinking of are
now indicated by the model files that have "propagateTags" set. The
examples include the semanticClassification and the
"hbase_table_column_families" relationships. We had not identified any use
cases we felt were important where BOTH would be useful for a
relationship; so were thinking of removing that option. Do you have some
relationships that require BOTH in the open types - it would be useful for
me to understand why those relationships need BOTH,
many thanks , David.
From: Mandy Chessell/UK/IBM
To: dev@atlas.apache.org
Cc: David Radley <da...@uk.ibm.com>, atlas
<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>
Date: 14/01/2018 13:25
Subject: Re: Tag propagation
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn:
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_pub_mandy-2Dchessell_22_897_a49&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=DEupm0k8-ppAmw6rImSmuE_tc4KzDG1cSUr7Fo_5T8Q&m=MV8WpwCeyTCRAC4oi3DRaoJFApKNSb616dYZRjPJeHQ&s=mwcUNR2iBI0bGMavvkqpv0C0bel2iQVHMCYcdaHZtng&e=
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: Madhan Neethiraj <ma...@apache.org>
To: David Radley <da...@uk.ibm.com>, Sarath Subramanian
<sa...@apache.org>
Cc: atlas <de...@atlas.incubator.apache.org>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by David Radley <da...@uk.ibm.com>.
Hi Mandy,
From what I recall, we discussed some scenarios that we felt Tag
propagation would be useful. I think the use cases we are thinking of are
now indicated by the model files that have "propagateTags" set. The
examples include the semanticClassification and the
"hbase_table_column_families" relationships. We had not identified any use
cases we felt were important where BOTH would be useful for a
relationship; so were thinking of removing that option. Do you have some
relationships that require BOTH in the open types - it would be useful for
me to understand why those relationships need BOTH,
many thanks , David.
From: Mandy Chessell/UK/IBM
To: dev@atlas.apache.org
Cc: David Radley <da...@uk.ibm.com>, atlas
<de...@atlas.incubator.apache.org>, Sarath Subramanian <sa...@apache.org>
Date: 14/01/2018 13:25
Subject: Re: Tag propagation
Hello Madhan, David,
I would not wish to remove the option to have tag propagation flow in both
directions. Most metadata relationships are not hierarchical. They are
two-way and different situations will cause for different classifications
to flow in each direction. I do not remember the discussion on removing
the BOTH open - but if I missed it I apologise. What is the
justification?
The enforcement of the classification's entity types should not prevent
the propagation of the tag through an entity because it does not support a
tag. Down stream entities may support the tag and need it to be
propagated to them. We need to work through more scenarios because we
also need a way to bound tag propagation :)
As an FYI, the OMRS API for classifications includes an origin attribute
that lets us return classifications with an entity that are explicitly
assigned or propagated to the entity. Most callers will not care but some
might.
All the best
Mandy
___________________________________________
Mandy Chessell CBE FREng CEng FBCS
IBM Distinguished Engineer
Master Inventor
Member of the IBM Academy of Technology
Visiting Professor, Department of Computer Science, University of
Sheffield
Email: mandy_chessell@uk.ibm.com
LinkedIn: http://www.linkedin.com/pub/mandy-chessell/22/897/a49
Assistant: Janet Brooks - jsbrooks12@uk.ibm.com
From: Madhan Neethiraj <ma...@apache.org>
To: David Radley <da...@uk.ibm.com>, Sarath Subramanian
<sa...@apache.org>
Cc: atlas <de...@atlas.incubator.apache.org>
Date: 13/01/2018 02:14
Subject: Re: Tag propagation
David,
Sarath was working on tag-propagation, but had to take up tasks related to
JanusGraph and others. He will be resuming tag-propagation work next week;
this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the
classification is not applicable? I think it wouldn’t be correct to block
a classification association to an entity if the classification is not
applicable for a down-stream entity.
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
I was thinking about a separate attribute,
AtlasEntity.propagatedClassifications, for this. However, I think your
suggestion of adding a field to AtlasClassification is a better one; with
this approach no changes would be needed in applications that process
classifications on an entity. How about we capture the guid of the source
entity on which the classification is associated,
AtlasClassification.sourceEntityGuid? If this value is null, then the
classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in
Ranger.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
To enable queries like ‘get list of entities that are classified as PII’,
it will be performant if each entity vertex has data about the propagated
classifications as well, similar to entities having data on
classifications directly associated with the entity currently. However,
all the entities should directly reference a single instance of a
classification, so that it will be easier to manage changes to
classification attribute values. Sarath will send an update on the design
choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was
not in. Is this something you are looking at in the near future? If not I
may need to look into it. I suggest the tag propagation implementation
should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to
get rid of this.
- should honour the classification entitytypes - so that we do not get
classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would
look in the get entity rest API - I suggest that they appear in the
entities classification with a field indicating that they are derived (and
hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using
the existing tag sync.
- I think you wanted the derived classifications to be picked up at query
time. I also remember suggesting that we store the derived classifications
in a derivedClassifiation property in the entity which would contain the
list of derived classifications. Or we could store them as a new type of
edge "propagated classification" edges to the real classification. I like
the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic
mapping to propagate the classifications to the hive column. The hive
column would not pick up classifications defined in the area 3 model like
"SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
Re: Tag propagation
Posted by Madhan Neethiraj <ma...@apache.org>.
David,
Sarath was working on tag-propagation, but had to take up tasks related to JanusGraph and others. He will be resuming tag-propagation work next week; this feature would be part of Atlas-1.0.0 release.
- lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this.
Agree.
- should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes
Perhaps we should stop the propagation at the entity where the classification is not applicable? I think it wouldn’t be correct to block a classification association to an entity if the classification is not applicable for a down-stream entity.
- There is the question about how the propagated classifications would look in the get entity rest API - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update).
I was thinking about a separate attribute, AtlasEntity.propagatedClassifications, for this. However, I think your suggestion of adding a field to AtlasClassification is a better one; with this approach no changes would be needed in applications that process classifications on an entity. How about we capture the guid of the source entity on which the classification is associated, AtlasClassification.sourceEntityGuid? If this value is null, then the classification is associated with the current entity directly.
- I would hope that Ranger would pick up these new propagated tags using the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in Ranger.
- I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea.
To enable queries like ‘get list of entities that are classified as PII’, it will be performant if each entity vertex has data about the propagated classifications as well, similar to entities having data on classifications directly associated with the entity currently. However, all the entities should directly reference a single instance of a classification, so that it will be easier to manage changes to classification attribute values. Sarath will send an update on the design choices later next week.
If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm".
Yes. This usecase should be covered by the design discussed above.
Thanks,
Madhan
From: David Radley <da...@uk.ibm.com>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <mn...@hortonworks.com>
Cc: atlas <de...@atlas.incubator.apache.org>
Subject: Tag propagation
Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was not in. Is this something you are looking at in the near future? If not I may need to look into it. I suggest the tag propagation implementation should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this.
- should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes
- There is the question about how the propagated classifications would look in the get entity rest API - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using the existing tag sync.
- I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea.
If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm".
What do you think? all the best, David.
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU