You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@accumulo.apache.org by Edmon Begoli <eb...@gmail.com> on 2012/08/08 22:08:34 UTC

Security and data design advice on structuring data on accumulo

I am trying to model the healthcare claim on accumulo and I want to
lay it out so that it:

A. Accurately reflects the structure of the claim

B. I could have controls finely applied to different sections of the document

I am simplifying matter but claim contains claim document identifiers,
demographics of the patient, and line items for the procedures
performed:

claim identifier, data submitted, data processed, state of origin, ...
patient name, dob, location, other identifiers
procedure 1 code, procedure 1 provider, procedure 1 cost, ...
...
procedure n code, procedure n provider, procedure n cost, ...


Patient demographic fields are PHI (personal health information) and
these should not be visible to all who want to perform analysis, but
only to main administrators,
patient and maybe physician. I assume these would have to have
separate authorization label.

Other fields may be visible to different groups of people - i.e.
federal claim administrators can see all, but  regional offices can
only see their states.
Separate, more permissive labels.

Finally, it might make sense to "elevate" some fields for easy access
and analysis - ie. diagnostic codes, zip code, cost.
This would not be a matter of labels, but data design.


With all this in mind, I would welcome if anyone has any security and
data design suggestions.

Re: Security and data design advice on structuring data on accumulo

Posted by Edmon Begoli <eb...@gmail.com>.

Adam,

This role label categorization you proposed is very close, if not
exact representation of the need:

(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)

Challenge is to create and manage hierarchy that is almost completely
personalized - similar to OAuth/OpenID
permission managed by Google, but in this case represented through labels.


On Fri, Aug 10, 2012 at 9:02 AM, Adam Fuchs <af...@apache.org> wrote:
> I guess I should have specified that the access time labels should be used
> in conjunction with the role labels, like
> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>
> Adam
>
> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bi...@gmail.com> wrote:
>>
>> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
>> > Not sure I understand why this gets into n*m roles. Can you elaborate?
>> >
>> > The question of when your physician should have access seems like it
>> > could
>> > be represented by just a few labels, like "regularCheckup",
>> > "illnessEvaluation", and "populationStudy". Those labels could then be
>> > tied
>> > to an auditing system that could verify appropriateness of access over
>> > time.
>>
>> And if you change doctors? Maybe that's a job for some sort of role/group
>> model.
>>
>>
>> >
>> > Adam
>> >
>> > On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
>> >>
>> >> I've thought quite a bit about the approach you've outlined
>> >> previously..
>> >>
>> >> The main caveat I've always struggled to overcome is how to encapsulate
>> >> *when* a physician should have access to your records. This expands the
>> >> problem into n*m roles which becomes difficult to manage inside
>> >> Accumulo,
>> >> especially as time elapses.
>> >>
>> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>> >>>
>> >>> Just some ideas and thoughts....
>> >>>
>> >>> With a system I'm building I have code to take care of user roles.
>> >>> Roles
>> >>> will define visibilities, how analysis is performed, information
>> >>> sharing, etc. I have a particular role for sharing. I also have an
>> >>> area
>> >>> of interest, usually assigned to a physician role, therefore only a
>> >>> physician's office can see certain data from it. The data
>> >>> corresponding
>> >>> to a given person can be accessed by that person ( if they have app
>> >>> access ), the physician that created it, and other physicians ( with a
>> >>> different area of interest ) with whom the user wants to share their
>> >>> data. Each area of interest will be cryptographically secured. Our
>> >>> approach will utilize multiple crypto technologies. I would suggest
>> >>> making crypto your last stop. Focus on getting
>> >>> the visibility hierarchy designed. HIPAA requirements can come later.
>> >>>
>> >>> In my approach, there is no elevation of fields per se. Instead, there
>> >>> are visibiilities for all assigned parties,so in my case it is a
>> >>> matter
>> >>> of labeling. The data can have hierarchies, and each hierarchy has
>> >>> different labels to control access.
>> >>>
>> >>> " Patient demographic fields are PHI (personal health information) and
>> >>> these should not be visible to all who want to perform analysis, but
>> >>> only to main administrators,
>> >>> patient and maybe physician. I assume these would have to have
>> >>> separate authorization label. "
>> >>>
>> >>> Yes. I think this is where roles will help. Assign roles and
>> >>> visibilities to those roles. As of right now, I'm putting ephemeral
>> >>> data
>> >>> in my visibilities ( user ID for a physician, among other things ). I
>> >>> will probably move this to the qualifier and take a more simple
>> >>> approach
>> >>> to visibilities.
>> >>>
>> >>> Each role has different actions. Right now I have four actions;
>> >>> syncing,
>> >>> querying, deleting, and sharing. You don't have to capture actions,
>> >>> but
>> >>> you might want to limit how the roles of users vary, and I think
>> >>> modeling the security actions within each role is an excellent way to
>> >>> do
>> >>> so.
>> >>>
>> >>>
>> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>> >>> <ma...@gmail.com>> wrote:
>> >>>
>> >>>     I am trying to model the healthcare claim on accumulo and I want
>> >>> to
>> >>>     lay it out so that it:
>> >>>
>> >>>     A. Accurately reflects the structure of the claim
>> >>>
>> >>>     B. I could have controls finely applied to different sections of
>> >>> the
>> >>>     document
>> >>>
>> >>>     I am simplifying matter but claim contains claim document
>> >>> identifiers,
>> >>>     demographics of the patient, and line items for the procedures
>> >>>     performed:
>> >>>
>> >>>     claim identifier, data submitted, data processed, state of origin,
>> >>> ...
>> >>>     patient name, dob, location, other identifiers
>> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>> >>>     ...
>> >>>     procedure n code, procedure n provider, procedure n cost, ...
>> >>>
>> >>>
>> >>>     Patient demographic fields are PHI (personal health information)
>> >>> and
>> >>>     these should not be visible to all who want to perform analysis,
>> >>> but
>> >>>     only to main administrators,
>> >>>     patient and maybe physician. I assume these would have to have
>> >>>     separate authorization label.
>> >>>
>> >>>     Other fields may be visible to different groups of people - i.e.
>> >>>     federal claim administrators can see all, but  regional offices
>> >>> can
>> >>>     only see their states.
>> >>>     Separate, more permissive labels.
>> >>>
>> >>>     Finally, it might make sense to "elevate" some fields for easy
>> >>> access
>> >>>     and analysis - ie. diagnostic codes, zip code, cost.
>> >>>     This would not be a matter of labels, but data design.
>> >>>
>> >>>
>> >>>     With all this in mind, I would welcome if anyone has any security
>> >>> and
>> >>>     data design suggestions.
>> >>>
>> >>>
>> >

Re: Security and data design advice on structuring data on accumulo

Posted by Edmon Begoli <eb...@gmail.com>.

Actually, time based labels would of great utility.

In healthcare, and very likely in other domains as well, there is a
concept of episode of care - a set of medical procedures and
interventions that have occurred as one logical event (think knee
surgery and associated therapy, etc)
but that connect different physicians, offices, etc. Not all involved
in episode of care have to see all of the patient's data, all the
time.

On Fri, Aug 10, 2012 at 9:54 AM, David Medinets
<da...@gmail.com> wrote:
>> The question of when your physician should have access seems..
>
> Does it make sense to add a time-based labels? So there might be a
> label called "Teen Years" or "First Disease" or "2011 First Quarter".

Re: Security and data design advice on structuring data on accumulo

Posted by David Medinets <da...@gmail.com>.

> The question of when your physician should have access seems..

Does it make sense to add a time-based labels? So there might be a
label called "Teen Years" or "First Disease" or "2011 First Quarter".

Re: Security and data design advice on structuring data on accumulo

Posted by Christopher Tubbs <ct...@gmail.com>.

I think an important take-away here (so far) is that you can't just
use "doctor" as a role... because that doesn't encapsulate all the
security considerations. Doctor X doesn't get to see patient Y's data,
unless X is Y's doctor, or Y has signed a release for him/her to see
it. So, "doctorOf<Y>" is an essential consideration. If this was all
that was encapsulated, then the labels would grow roughly linearly
with the number of patients, not the number of "users" (if patients
happen to be users, that's simply a coincidence).

Since patient privacy is primarily what is being protected, I'd make
the roles relative to the patient:
doctorOfPatientX
familyMemberOfPatientX
isPatientX
lawyerOfPatientX
insurerOfPatientX
nurseOfPatientX
etc...

So, the roles would scale n*m, where n is the number of patients, and
m is roughly a fixed set of roles relative to each patient (m should
be pretty small).

You could put the patient in the row, but then you're relying on an
external system to filter the data (constrain the query) based on
roles *that* system understands. The built-in Accumulo roles would
simply constrain that external query system.

On Fri, Aug 10, 2012 at 1:00 PM, Marc Parisi <ma...@accumulo.net> wrote:
> My suggestion of roles was to have a finite number of roles, with a finite
> number of actions. you would only store auths for those roles and actions.
> another lookup mechanism, in my system, will determine which user to use (
> as I recall. i don't have the code in front of me ). I did mention something
> about putting an id ( a key id perhaps ) in the CV; however, this could be
> moved elsewhere.
>
> doctor is a role. Dr. Parisi is not a role, it's a lookup to see if Parisi
> is a doctor, if so use that user ( role ). The doctor user would have the cv
> to see the user visibility. With the cryptographic hash in the cv, the goal
> was to limit which patients a doctor could see, but I can just as easily put
> that in the row to enforce that limitation.
>
> hopefully that makes sense.
>
> On Fri, Aug 10, 2012 at 12:33 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>
>> > But that's not really n*m, since it only specifies me by name. This
>> > should
>> > be roughly linear with users, no?
>>
>> Correct.
>>
>> On Fri, Aug 10, 2012 at 12:05 PM, Adam Fuchs <af...@apache.org> wrote:
>> > But that's not really n*m, since it only specifies me by name. This
>> > should
>> > be roughly linear with users, no?
>> >
>> > There is definitely a reliance on some external service managing the
>> > roles
>> > that docs are in, but this should be tractable.
>> >
>> > Adam
>> >
>> > On Aug 10, 2012 11:56 AM, "Josh Elser" <jo...@gmail.com> wrote:
>> >>
>> >> That's what I meant, user*doctors.
>> >>
>> >> It's not enough to say "healthteam", you have to qualify it by user
>> >> too:
>> >> "adamhealthteam".
>> >>
>> >> On 8/10/12 9:02 AM, Adam Fuchs wrote:
>> >>
>> >> I guess I should have specified that the access time labels should be
>> >> used
>> >> in conjunction with the role labels, like
>> >>
>> >> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>> >>
>> >> Adam
>> >>
>> >> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bi...@gmail.com>
>> >> wrote:
>> >>>
>> >>> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
>> >>> > Not sure I understand why this gets into n*m roles. Can you
>> >>> > elaborate?
>> >>> >
>> >>> > The question of when your physician should have access seems like it
>> >>> > could
>> >>> > be represented by just a few labels, like "regularCheckup",
>> >>> > "illnessEvaluation", and "populationStudy". Those labels could then
>> >>> > be
>> >>> > tied
>> >>> > to an auditing system that could verify appropriateness of access
>> >>> > over
>> >>> > time.
>> >>>
>> >>> And if you change doctors? Maybe that's a job for some sort of
>> >>> role/group
>> >>> model.
>> >>>
>> >>>
>> >>> >
>> >>> > Adam
>> >>> >
>> >>> > On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
>> >>> >>
>> >>> >> I've thought quite a bit about the approach you've outlined
>> >>> >> previously..
>> >>> >>
>> >>> >> The main caveat I've always struggled to overcome is how to
>> >>> >> encapsulate
>> >>> >> *when* a physician should have access to your records. This expands
>> >>> >> the
>> >>> >> problem into n*m roles which becomes difficult to manage inside
>> >>> >> Accumulo,
>> >>> >> especially as time elapses.
>> >>> >>
>> >>> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>> >>> >>>
>> >>> >>> Just some ideas and thoughts....
>> >>> >>>
>> >>> >>> With a system I'm building I have code to take care of user roles.
>> >>> >>> Roles
>> >>> >>> will define visibilities, how analysis is performed, information
>> >>> >>> sharing, etc. I have a particular role for sharing. I also have an
>> >>> >>> area
>> >>> >>> of interest, usually assigned to a physician role, therefore only
>> >>> >>> a
>> >>> >>> physician's office can see certain data from it. The data
>> >>> >>> corresponding
>> >>> >>> to a given person can be accessed by that person ( if they have
>> >>> >>> app
>> >>> >>> access ), the physician that created it, and other physicians (
>> >>> >>> with
>> >>> >>> a
>> >>> >>> different area of interest ) with whom the user wants to share
>> >>> >>> their
>> >>> >>> data. Each area of interest will be cryptographically secured. Our
>> >>> >>> approach will utilize multiple crypto technologies. I would
>> >>> >>> suggest
>> >>> >>> making crypto your last stop. Focus on getting
>> >>> >>> the visibility hierarchy designed. HIPAA requirements can come
>> >>> >>> later.
>> >>> >>>
>> >>> >>> In my approach, there is no elevation of fields per se. Instead,
>> >>> >>> there
>> >>> >>> are visibiilities for all assigned parties,so in my case it is a
>> >>> >>> matter
>> >>> >>> of labeling. The data can have hierarchies, and each hierarchy has
>> >>> >>> different labels to control access.
>> >>> >>>
>> >>> >>> " Patient demographic fields are PHI (personal health information)
>> >>> >>> and
>> >>> >>> these should not be visible to all who want to perform analysis,
>> >>> >>> but
>> >>> >>> only to main administrators,
>> >>> >>> patient and maybe physician. I assume these would have to have
>> >>> >>> separate authorization label. "
>> >>> >>>
>> >>> >>> Yes. I think this is where roles will help. Assign roles and
>> >>> >>> visibilities to those roles. As of right now, I'm putting
>> >>> >>> ephemeral
>> >>> >>> data
>> >>> >>> in my visibilities ( user ID for a physician, among other things
>> >>> >>> ). I
>> >>> >>> will probably move this to the qualifier and take a more simple
>> >>> >>> approach
>> >>> >>> to visibilities.
>> >>> >>>
>> >>> >>> Each role has different actions. Right now I have four actions;
>> >>> >>> syncing,
>> >>> >>> querying, deleting, and sharing. You don't have to capture
>> >>> >>> actions,
>> >>> >>> but
>> >>> >>> you might want to limit how the roles of users vary, and I think
>> >>> >>> modeling the security actions within each role is an excellent way
>> >>> >>> to
>> >>> >>> do
>> >>> >>> so.
>> >>> >>>
>> >>> >>>
>> >>> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>> >>> >>> <ma...@gmail.com>> wrote:
>> >>> >>>
>> >>> >>>     I am trying to model the healthcare claim on accumulo and I
>> >>> >>> want
>> >>> >>> to
>> >>> >>>     lay it out so that it:
>> >>> >>>
>> >>> >>>     A. Accurately reflects the structure of the claim
>> >>> >>>
>> >>> >>>     B. I could have controls finely applied to different sections
>> >>> >>> of
>> >>> >>> the
>> >>> >>>     document
>> >>> >>>
>> >>> >>>     I am simplifying matter but claim contains claim document
>> >>> >>> identifiers,
>> >>> >>>     demographics of the patient, and line items for the procedures
>> >>> >>>     performed:
>> >>> >>>
>> >>> >>>     claim identifier, data submitted, data processed, state of
>> >>> >>> origin,
>> >>> >>> ...
>> >>> >>>     patient name, dob, location, other identifiers
>> >>> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>> >>> >>>     ...
>> >>> >>>     procedure n code, procedure n provider, procedure n cost, ...
>> >>> >>>
>> >>> >>>
>> >>> >>>     Patient demographic fields are PHI (personal health
>> >>> >>> information)
>> >>> >>> and
>> >>> >>>     these should not be visible to all who want to perform
>> >>> >>> analysis,
>> >>> >>> but
>> >>> >>>     only to main administrators,
>> >>> >>>     patient and maybe physician. I assume these would have to have
>> >>> >>>     separate authorization label.
>> >>> >>>
>> >>> >>>     Other fields may be visible to different groups of people -
>> >>> >>> i.e.
>> >>> >>>     federal claim administrators can see all, but  regional
>> >>> >>> offices
>> >>> >>> can
>> >>> >>>     only see their states.
>> >>> >>>     Separate, more permissive labels.
>> >>> >>>
>> >>> >>>     Finally, it might make sense to "elevate" some fields for easy
>> >>> >>> access
>> >>> >>>     and analysis - ie. diagnostic codes, zip code, cost.
>> >>> >>>     This would not be a matter of labels, but data design.
>> >>> >>>
>> >>> >>>
>> >>> >>>     With all this in mind, I would welcome if anyone has any
>> >>> >>> security
>> >>> >>> and
>> >>> >>>     data design suggestions.
>> >>> >>>
>> >>> >>>
>> >>> >
>> >>
>> >>
>> >
>
>

Re: Security and data design advice on structuring data on accumulo

Posted by Marc Parisi <ma...@accumulo.net>.

My suggestion of roles was to have a finite number of roles, with a finite
number of actions. you would only store auths for those roles and actions.
another lookup mechanism, in my system, will determine which user to use (
as I recall. i don't have the code in front of me ). I did mention
something about putting an id ( a key id perhaps ) in the CV; however, this
could be moved elsewhere.

doctor is a role. Dr. Parisi is not a role, it's a lookup to see if Parisi
is a doctor, if so use that user ( role ). The doctor user would have the
cv to see the user visibility. With the cryptographic hash in the cv, the
goal was to limit which patients a doctor could see, but I can just as
easily put that in the row to enforce that limitation.

hopefully that makes sense.

On Fri, Aug 10, 2012 at 12:33 PM, Edmon Begoli <eb...@gmail.com> wrote:

> > But that's not really n*m, since it only specifies me by name. This
> should
> > be roughly linear with users, no?
>
> Correct.
>
> On Fri, Aug 10, 2012 at 12:05 PM, Adam Fuchs <af...@apache.org> wrote:
> > But that's not really n*m, since it only specifies me by name. This
> should
> > be roughly linear with users, no?
> >
> > There is definitely a reliance on some external service managing the
> roles
> > that docs are in, but this should be tractable.
> >
> > Adam
> >
> > On Aug 10, 2012 11:56 AM, "Josh Elser" <jo...@gmail.com> wrote:
> >>
> >> That's what I meant, user*doctors.
> >>
> >> It's not enough to say "healthteam", you have to qualify it by user too:
> >> "adamhealthteam".
> >>
> >> On 8/10/12 9:02 AM, Adam Fuchs wrote:
> >>
> >> I guess I should have specified that the access time labels should be
> used
> >> in conjunction with the role labels, like
> >>
> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
> >>
> >> Adam
> >>
> >> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bi...@gmail.com>
> wrote:
> >>>
> >>> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
> >>> > Not sure I understand why this gets into n*m roles. Can you
> elaborate?
> >>> >
> >>> > The question of when your physician should have access seems like it
> >>> > could
> >>> > be represented by just a few labels, like "regularCheckup",
> >>> > "illnessEvaluation", and "populationStudy". Those labels could then
> be
> >>> > tied
> >>> > to an auditing system that could verify appropriateness of access
> over
> >>> > time.
> >>>
> >>> And if you change doctors? Maybe that's a job for some sort of
> role/group
> >>> model.
> >>>
> >>>
> >>> >
> >>> > Adam
> >>> >
> >>> > On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
> >>> >>
> >>> >> I've thought quite a bit about the approach you've outlined
> >>> >> previously..
> >>> >>
> >>> >> The main caveat I've always struggled to overcome is how to
> >>> >> encapsulate
> >>> >> *when* a physician should have access to your records. This expands
> >>> >> the
> >>> >> problem into n*m roles which becomes difficult to manage inside
> >>> >> Accumulo,
> >>> >> especially as time elapses.
> >>> >>
> >>> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
> >>> >>>
> >>> >>> Just some ideas and thoughts....
> >>> >>>
> >>> >>> With a system I'm building I have code to take care of user roles.
> >>> >>> Roles
> >>> >>> will define visibilities, how analysis is performed, information
> >>> >>> sharing, etc. I have a particular role for sharing. I also have an
> >>> >>> area
> >>> >>> of interest, usually assigned to a physician role, therefore only a
> >>> >>> physician's office can see certain data from it. The data
> >>> >>> corresponding
> >>> >>> to a given person can be accessed by that person ( if they have app
> >>> >>> access ), the physician that created it, and other physicians (
> with
> >>> >>> a
> >>> >>> different area of interest ) with whom the user wants to share
> their
> >>> >>> data. Each area of interest will be cryptographically secured. Our
> >>> >>> approach will utilize multiple crypto technologies. I would suggest
> >>> >>> making crypto your last stop. Focus on getting
> >>> >>> the visibility hierarchy designed. HIPAA requirements can come
> later.
> >>> >>>
> >>> >>> In my approach, there is no elevation of fields per se. Instead,
> >>> >>> there
> >>> >>> are visibiilities for all assigned parties,so in my case it is a
> >>> >>> matter
> >>> >>> of labeling. The data can have hierarchies, and each hierarchy has
> >>> >>> different labels to control access.
> >>> >>>
> >>> >>> " Patient demographic fields are PHI (personal health information)
> >>> >>> and
> >>> >>> these should not be visible to all who want to perform analysis,
> but
> >>> >>> only to main administrators,
> >>> >>> patient and maybe physician. I assume these would have to have
> >>> >>> separate authorization label. "
> >>> >>>
> >>> >>> Yes. I think this is where roles will help. Assign roles and
> >>> >>> visibilities to those roles. As of right now, I'm putting ephemeral
> >>> >>> data
> >>> >>> in my visibilities ( user ID for a physician, among other things
> ). I
> >>> >>> will probably move this to the qualifier and take a more simple
> >>> >>> approach
> >>> >>> to visibilities.
> >>> >>>
> >>> >>> Each role has different actions. Right now I have four actions;
> >>> >>> syncing,
> >>> >>> querying, deleting, and sharing. You don't have to capture actions,
> >>> >>> but
> >>> >>> you might want to limit how the roles of users vary, and I think
> >>> >>> modeling the security actions within each role is an excellent way
> to
> >>> >>> do
> >>> >>> so.
> >>> >>>
> >>> >>>
> >>> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
> >>> >>> <ma...@gmail.com>> wrote:
> >>> >>>
> >>> >>>     I am trying to model the healthcare claim on accumulo and I
> want
> >>> >>> to
> >>> >>>     lay it out so that it:
> >>> >>>
> >>> >>>     A. Accurately reflects the structure of the claim
> >>> >>>
> >>> >>>     B. I could have controls finely applied to different sections
> of
> >>> >>> the
> >>> >>>     document
> >>> >>>
> >>> >>>     I am simplifying matter but claim contains claim document
> >>> >>> identifiers,
> >>> >>>     demographics of the patient, and line items for the procedures
> >>> >>>     performed:
> >>> >>>
> >>> >>>     claim identifier, data submitted, data processed, state of
> >>> >>> origin,
> >>> >>> ...
> >>> >>>     patient name, dob, location, other identifiers
> >>> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
> >>> >>>     ...
> >>> >>>     procedure n code, procedure n provider, procedure n cost, ...
> >>> >>>
> >>> >>>
> >>> >>>     Patient demographic fields are PHI (personal health
> information)
> >>> >>> and
> >>> >>>     these should not be visible to all who want to perform
> analysis,
> >>> >>> but
> >>> >>>     only to main administrators,
> >>> >>>     patient and maybe physician. I assume these would have to have
> >>> >>>     separate authorization label.
> >>> >>>
> >>> >>>     Other fields may be visible to different groups of people -
> i.e.
> >>> >>>     federal claim administrators can see all, but  regional offices
> >>> >>> can
> >>> >>>     only see their states.
> >>> >>>     Separate, more permissive labels.
> >>> >>>
> >>> >>>     Finally, it might make sense to "elevate" some fields for easy
> >>> >>> access
> >>> >>>     and analysis - ie. diagnostic codes, zip code, cost.
> >>> >>>     This would not be a matter of labels, but data design.
> >>> >>>
> >>> >>>
> >>> >>>     With all this in mind, I would welcome if anyone has any
> security
> >>> >>> and
> >>> >>>     data design suggestions.
> >>> >>>
> >>> >>>
> >>> >
> >>
> >>
> >
>

Re: Security and data design advice on structuring data on accumulo

Posted by Edmon Begoli <eb...@gmail.com>.

> But that's not really n*m, since it only specifies me by name. This should
> be roughly linear with users, no?

Correct.

On Fri, Aug 10, 2012 at 12:05 PM, Adam Fuchs <af...@apache.org> wrote:
> But that's not really n*m, since it only specifies me by name. This should
> be roughly linear with users, no?
>
> There is definitely a reliance on some external service managing the roles
> that docs are in, but this should be tractable.
>
> Adam
>
> On Aug 10, 2012 11:56 AM, "Josh Elser" <jo...@gmail.com> wrote:
>>
>> That's what I meant, user*doctors.
>>
>> It's not enough to say "healthteam", you have to qualify it by user too:
>> "adamhealthteam".
>>
>> On 8/10/12 9:02 AM, Adam Fuchs wrote:
>>
>> I guess I should have specified that the access time labels should be used
>> in conjunction with the role labels, like
>> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>>
>> Adam
>>
>> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bi...@gmail.com> wrote:
>>>
>>> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
>>> > Not sure I understand why this gets into n*m roles. Can you elaborate?
>>> >
>>> > The question of when your physician should have access seems like it
>>> > could
>>> > be represented by just a few labels, like "regularCheckup",
>>> > "illnessEvaluation", and "populationStudy". Those labels could then be
>>> > tied
>>> > to an auditing system that could verify appropriateness of access over
>>> > time.
>>>
>>> And if you change doctors? Maybe that's a job for some sort of role/group
>>> model.
>>>
>>>
>>> >
>>> > Adam
>>> >
>>> > On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
>>> >>
>>> >> I've thought quite a bit about the approach you've outlined
>>> >> previously..
>>> >>
>>> >> The main caveat I've always struggled to overcome is how to
>>> >> encapsulate
>>> >> *when* a physician should have access to your records. This expands
>>> >> the
>>> >> problem into n*m roles which becomes difficult to manage inside
>>> >> Accumulo,
>>> >> especially as time elapses.
>>> >>
>>> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>>> >>>
>>> >>> Just some ideas and thoughts....
>>> >>>
>>> >>> With a system I'm building I have code to take care of user roles.
>>> >>> Roles
>>> >>> will define visibilities, how analysis is performed, information
>>> >>> sharing, etc. I have a particular role for sharing. I also have an
>>> >>> area
>>> >>> of interest, usually assigned to a physician role, therefore only a
>>> >>> physician's office can see certain data from it. The data
>>> >>> corresponding
>>> >>> to a given person can be accessed by that person ( if they have app
>>> >>> access ), the physician that created it, and other physicians ( with
>>> >>> a
>>> >>> different area of interest ) with whom the user wants to share their
>>> >>> data. Each area of interest will be cryptographically secured. Our
>>> >>> approach will utilize multiple crypto technologies. I would suggest
>>> >>> making crypto your last stop. Focus on getting
>>> >>> the visibility hierarchy designed. HIPAA requirements can come later.
>>> >>>
>>> >>> In my approach, there is no elevation of fields per se. Instead,
>>> >>> there
>>> >>> are visibiilities for all assigned parties,so in my case it is a
>>> >>> matter
>>> >>> of labeling. The data can have hierarchies, and each hierarchy has
>>> >>> different labels to control access.
>>> >>>
>>> >>> " Patient demographic fields are PHI (personal health information)
>>> >>> and
>>> >>> these should not be visible to all who want to perform analysis, but
>>> >>> only to main administrators,
>>> >>> patient and maybe physician. I assume these would have to have
>>> >>> separate authorization label. "
>>> >>>
>>> >>> Yes. I think this is where roles will help. Assign roles and
>>> >>> visibilities to those roles. As of right now, I'm putting ephemeral
>>> >>> data
>>> >>> in my visibilities ( user ID for a physician, among other things ). I
>>> >>> will probably move this to the qualifier and take a more simple
>>> >>> approach
>>> >>> to visibilities.
>>> >>>
>>> >>> Each role has different actions. Right now I have four actions;
>>> >>> syncing,
>>> >>> querying, deleting, and sharing. You don't have to capture actions,
>>> >>> but
>>> >>> you might want to limit how the roles of users vary, and I think
>>> >>> modeling the security actions within each role is an excellent way to
>>> >>> do
>>> >>> so.
>>> >>>
>>> >>>
>>> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>>> >>> <ma...@gmail.com>> wrote:
>>> >>>
>>> >>>     I am trying to model the healthcare claim on accumulo and I want
>>> >>> to
>>> >>>     lay it out so that it:
>>> >>>
>>> >>>     A. Accurately reflects the structure of the claim
>>> >>>
>>> >>>     B. I could have controls finely applied to different sections of
>>> >>> the
>>> >>>     document
>>> >>>
>>> >>>     I am simplifying matter but claim contains claim document
>>> >>> identifiers,
>>> >>>     demographics of the patient, and line items for the procedures
>>> >>>     performed:
>>> >>>
>>> >>>     claim identifier, data submitted, data processed, state of
>>> >>> origin,
>>> >>> ...
>>> >>>     patient name, dob, location, other identifiers
>>> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>>> >>>     ...
>>> >>>     procedure n code, procedure n provider, procedure n cost, ...
>>> >>>
>>> >>>
>>> >>>     Patient demographic fields are PHI (personal health information)
>>> >>> and
>>> >>>     these should not be visible to all who want to perform analysis,
>>> >>> but
>>> >>>     only to main administrators,
>>> >>>     patient and maybe physician. I assume these would have to have
>>> >>>     separate authorization label.
>>> >>>
>>> >>>     Other fields may be visible to different groups of people - i.e.
>>> >>>     federal claim administrators can see all, but  regional offices
>>> >>> can
>>> >>>     only see their states.
>>> >>>     Separate, more permissive labels.
>>> >>>
>>> >>>     Finally, it might make sense to "elevate" some fields for easy
>>> >>> access
>>> >>>     and analysis - ie. diagnostic codes, zip code, cost.
>>> >>>     This would not be a matter of labels, but data design.
>>> >>>
>>> >>>
>>> >>>     With all this in mind, I would welcome if anyone has any security
>>> >>> and
>>> >>>     data design suggestions.
>>> >>>
>>> >>>
>>> >
>>
>>
>

Re: Security and data design advice on structuring data on accumulo

Posted by David Medinets <da...@gmail.com>.

On Fri, Aug 10, 2012 at 12:28 PM, Josh Elser <jo...@gmail.com> wrote:
> I'm sure a user-roles approach could work to a point; but I feel like there
> is potential for a much more elegant solution. I'm curious if others have
> had thoughts about this.

Can you say why you feel there might be a more elegant solution?
Users and roles have been the primary (if not only) approach for major
access control systems. Or have I missed hearing about something?

Re: Security and data design advice on structuring data on accumulo

Posted by Josh Elser <jo...@gmail.com>.

The underlying issue I'm poking at is this:

Pluggable authorizations systems I've seen attached to Accumulo in the 
past have operated in the following fashion: A single superuser in 
Accumulo has all of the authorizations for data stored in Accumulo. The 
authorization system determines the correct Accumulo Authorizations for 
the current user and intersects the user's Authorizations with the 
superuser's Authorizations (read as: all Authorizations) to perform a 
scan over Accumulo at the desired level. Thus, end-users don't have 
accounts on Accumulo; user queries run as a the superuser.

Back to the current example, as you said, the number of "groups" should 
grow roughly linearly to the number of users; however, this now requires 
that every user has an Accumulo account. The difference is that a doctor 
will be in many users' groups (e.g. you and I could share a doctor). To 
my understanding, all of this user/authorization information is stored 
inside of ZooKeeper. It seems less-than-ideal to me to store user 
accounts for every patient and every doctor, where every doctor has many 
"roles", but it also appears intractable to me to have a 
single-superuser with all auths (as previously outlined).

I'm sure a user-roles approach could work to a point; but I feel like 
there is potential for a much more elegant solution. I'm curious if 
others have had thoughts about this.

On 8/10/12 12:05 PM, Adam Fuchs wrote:
>
> But that's not really n*m, since it only specifies me by name. This 
> should be roughly linear with users, no?
>
> There is definitely a reliance on some external service managing the 
> roles that docs are in, but this should be tractable.
>
> Adam
>
> On Aug 10, 2012 11:56 AM, "Josh Elser" <josh.elser@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     That's what I meant, user*doctors.
>
>     It's not enough to say "healthteam", you have to qualify it by
>     user too: "adamhealthteam".
>
>     On 8/10/12 9:02 AM, Adam Fuchs wrote:
>>
>>     I guess I should have specified that the access time labels
>>     should be used in conjunction with the role labels, like
>>     "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>>
>>     Adam
>>
>>     On Aug 10, 2012 8:56 AM, "Benson Margulies"
>>     <bimargulies@gmail.com <ma...@gmail.com>> wrote:
>>
>>         On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs
>>         <afuchs@apache.org <ma...@apache.org>> wrote:
>>         > Not sure I understand why this gets into n*m roles. Can you
>>         elaborate?
>>         >
>>         > The question of when your physician should have access
>>         seems like it could
>>         > be represented by just a few labels, like "regularCheckup",
>>         > "illnessEvaluation", and "populationStudy". Those labels
>>         could then be tied
>>         > to an auditing system that could verify appropriateness of
>>         access over time.
>>
>>         And if you change doctors? Maybe that's a job for some sort
>>         of role/group model.
>>
>>
>>         >
>>         > Adam
>>         >
>>         > On Aug 9, 2012 10:19 PM, "Josh Elser" <josh.elser@gmail.com
>>         <ma...@gmail.com>> wrote:
>>         >>
>>         >> I've thought quite a bit about the approach you've
>>         outlined previously..
>>         >>
>>         >> The main caveat I've always struggled to overcome is how
>>         to encapsulate
>>         >> *when* a physician should have access to your records.
>>         This expands the
>>         >> problem into n*m roles which becomes difficult to manage
>>         inside Accumulo,
>>         >> especially as time elapses.
>>         >>
>>         >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>>         >>>
>>         >>> Just some ideas and thoughts....
>>         >>>
>>         >>> With a system I'm building I have code to take care of
>>         user roles. Roles
>>         >>> will define visibilities, how analysis is performed,
>>         information
>>         >>> sharing, etc. I have a particular role for sharing. I
>>         also have an area
>>         >>> of interest, usually assigned to a physician role,
>>         therefore only a
>>         >>> physician's office can see certain data from it. The data
>>         corresponding
>>         >>> to a given person can be accessed by that person ( if
>>         they have app
>>         >>> access ), the physician that created it, and other
>>         physicians ( with a
>>         >>> different area of interest ) with whom the user wants to
>>         share their
>>         >>> data. Each area of interest will be cryptographically
>>         secured. Our
>>         >>> approach will utilize multiple crypto technologies. I
>>         would suggest
>>         >>> making crypto your last stop. Focus on getting
>>         >>> the visibility hierarchy designed. HIPAA requirements can
>>         come later.
>>         >>>
>>         >>> In my approach, there is no elevation of fields per se.
>>         Instead, there
>>         >>> are visibiilities for all assigned parties,so in my case
>>         it is a matter
>>         >>> of labeling. The data can have hierarchies, and each
>>         hierarchy has
>>         >>> different labels to control access.
>>         >>>
>>         >>> " Patient demographic fields are PHI (personal health
>>         information) and
>>         >>> these should not be visible to all who want to perform
>>         analysis, but
>>         >>> only to main administrators,
>>         >>> patient and maybe physician. I assume these would have to
>>         have
>>         >>> separate authorization label. "
>>         >>>
>>         >>> Yes. I think this is where roles will help. Assign roles and
>>         >>> visibilities to those roles. As of right now, I'm putting
>>         ephemeral data
>>         >>> in my visibilities ( user ID for a physician, among other
>>         things ). I
>>         >>> will probably move this to the qualifier and take a more
>>         simple approach
>>         >>> to visibilities.
>>         >>>
>>         >>> Each role has different actions. Right now I have four
>>         actions; syncing,
>>         >>> querying, deleting, and sharing. You don't have to
>>         capture actions, but
>>         >>> you might want to limit how the roles of users vary, and
>>         I think
>>         >>> modeling the security actions within each role is an
>>         excellent way to do
>>         >>> so.
>>         >>>
>>         >>>
>>         >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli
>>         <ebegoli@gmail.com <ma...@gmail.com>
>>         >>> <mailto:ebegoli@gmail.com <ma...@gmail.com>>> wrote:
>>         >>>
>>         >>>     I am trying to model the healthcare claim on accumulo
>>         and I want to
>>         >>>     lay it out so that it:
>>         >>>
>>         >>>     A. Accurately reflects the structure of the claim
>>         >>>
>>         >>>     B. I could have controls finely applied to different
>>         sections of the
>>         >>>     document
>>         >>>
>>         >>>     I am simplifying matter but claim contains claim document
>>         >>> identifiers,
>>         >>>     demographics of the patient, and line items for the
>>         procedures
>>         >>>     performed:
>>         >>>
>>         >>>     claim identifier, data submitted, data processed,
>>         state of origin,
>>         >>> ...
>>         >>>     patient name, dob, location, other identifiers
>>         >>>     procedure 1 code, procedure 1 provider, procedure 1
>>         cost, ...
>>         >>>     ...
>>         >>>     procedure n code, procedure n provider, procedure n
>>         cost, ...
>>         >>>
>>         >>>
>>         >>>     Patient demographic fields are PHI (personal health
>>         information) and
>>         >>>     these should not be visible to all who want to
>>         perform analysis, but
>>         >>>     only to main administrators,
>>         >>>     patient and maybe physician. I assume these would
>>         have to have
>>         >>>     separate authorization label.
>>         >>>
>>         >>>     Other fields may be visible to different groups of
>>         people - i.e.
>>         >>>     federal claim administrators can see all, but
>>          regional offices can
>>         >>>     only see their states.
>>         >>>     Separate, more permissive labels.
>>         >>>
>>         >>>     Finally, it might make sense to "elevate" some fields
>>         for easy access
>>         >>>     and analysis - ie. diagnostic codes, zip code, cost.
>>         >>>     This would not be a matter of labels, but data design.
>>         >>>
>>         >>>
>>         >>>     With all this in mind, I would welcome if anyone has
>>         any security and
>>         >>>     data design suggestions.
>>         >>>
>>         >>>
>>         >
>>
>

Re: Security and data design advice on structuring data on accumulo

Posted by Adam Fuchs <af...@apache.org>.

But that's not really n*m, since it only specifies me by name. This should
be roughly linear with users, no?

There is definitely a reliance on some external service managing the roles
that docs are in, but this should be tractable.

Adam
On Aug 10, 2012 11:56 AM, "Josh Elser" <jo...@gmail.com> wrote:

>  That's what I meant, user*doctors.
>
> It's not enough to say "healthteam", you have to qualify it by user too:
> "adamhealthteam".
>
> On 8/10/12 9:02 AM, Adam Fuchs wrote:
>
> I guess I should have specified that the access time labels should be used
> in conjunction with the role labels, like
> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>
> Adam
> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bi...@gmail.com> wrote:
>
>> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
>> > Not sure I understand why this gets into n*m roles. Can you elaborate?
>> >
>> > The question of when your physician should have access seems like it
>> could
>> > be represented by just a few labels, like "regularCheckup",
>> > "illnessEvaluation", and "populationStudy". Those labels could then be
>> tied
>> > to an auditing system that could verify appropriateness of access over
>> time.
>>
>> And if you change doctors? Maybe that's a job for some sort of role/group
>> model.
>>
>>
>> >
>> > Adam
>> >
>> > On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
>> >>
>> >> I've thought quite a bit about the approach you've outlined
>> previously..
>> >>
>> >> The main caveat I've always struggled to overcome is how to encapsulate
>> >> *when* a physician should have access to your records. This expands the
>> >> problem into n*m roles which becomes difficult to manage inside
>> Accumulo,
>> >> especially as time elapses.
>> >>
>> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>> >>>
>> >>> Just some ideas and thoughts....
>> >>>
>> >>> With a system I'm building I have code to take care of user roles.
>> Roles
>> >>> will define visibilities, how analysis is performed, information
>> >>> sharing, etc. I have a particular role for sharing. I also have an
>> area
>> >>> of interest, usually assigned to a physician role, therefore only a
>> >>> physician's office can see certain data from it. The data
>> corresponding
>> >>> to a given person can be accessed by that person ( if they have app
>> >>> access ), the physician that created it, and other physicians ( with a
>> >>> different area of interest ) with whom the user wants to share their
>> >>> data. Each area of interest will be cryptographically secured. Our
>> >>> approach will utilize multiple crypto technologies. I would suggest
>> >>> making crypto your last stop. Focus on getting
>> >>> the visibility hierarchy designed. HIPAA requirements can come later.
>> >>>
>> >>> In my approach, there is no elevation of fields per se. Instead, there
>> >>> are visibiilities for all assigned parties,so in my case it is a
>> matter
>> >>> of labeling. The data can have hierarchies, and each hierarchy has
>> >>> different labels to control access.
>> >>>
>> >>> " Patient demographic fields are PHI (personal health information) and
>> >>> these should not be visible to all who want to perform analysis, but
>> >>> only to main administrators,
>> >>> patient and maybe physician. I assume these would have to have
>> >>> separate authorization label. "
>> >>>
>> >>> Yes. I think this is where roles will help. Assign roles and
>> >>> visibilities to those roles. As of right now, I'm putting ephemeral
>> data
>> >>> in my visibilities ( user ID for a physician, among other things ). I
>> >>> will probably move this to the qualifier and take a more simple
>> approach
>> >>> to visibilities.
>> >>>
>> >>> Each role has different actions. Right now I have four actions;
>> syncing,
>> >>> querying, deleting, and sharing. You don't have to capture actions,
>> but
>> >>> you might want to limit how the roles of users vary, and I think
>> >>> modeling the security actions within each role is an excellent way to
>> do
>> >>> so.
>> >>>
>> >>>
>> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>> >>> <ma...@gmail.com>> wrote:
>> >>>
>> >>>     I am trying to model the healthcare claim on accumulo and I want
>> to
>> >>>     lay it out so that it:
>> >>>
>> >>>     A. Accurately reflects the structure of the claim
>> >>>
>> >>>     B. I could have controls finely applied to different sections of
>> the
>> >>>     document
>> >>>
>> >>>     I am simplifying matter but claim contains claim document
>> >>> identifiers,
>> >>>     demographics of the patient, and line items for the procedures
>> >>>     performed:
>> >>>
>> >>>     claim identifier, data submitted, data processed, state of origin,
>> >>> ...
>> >>>     patient name, dob, location, other identifiers
>> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>> >>>     ...
>> >>>     procedure n code, procedure n provider, procedure n cost, ...
>> >>>
>> >>>
>> >>>     Patient demographic fields are PHI (personal health information)
>> and
>> >>>     these should not be visible to all who want to perform analysis,
>> but
>> >>>     only to main administrators,
>> >>>     patient and maybe physician. I assume these would have to have
>> >>>     separate authorization label.
>> >>>
>> >>>     Other fields may be visible to different groups of people - i.e.
>> >>>     federal claim administrators can see all, but  regional offices
>> can
>> >>>     only see their states.
>> >>>     Separate, more permissive labels.
>> >>>
>> >>>     Finally, it might make sense to "elevate" some fields for easy
>> access
>> >>>     and analysis - ie. diagnostic codes, zip code, cost.
>> >>>     This would not be a matter of labels, but data design.
>> >>>
>> >>>
>> >>>     With all this in mind, I would welcome if anyone has any security
>> and
>> >>>     data design suggestions.
>> >>>
>> >>>
>> >
>>
>
>

Re: Security and data design advice on structuring data on accumulo

Posted by Josh Elser <jo...@gmail.com>.

That's what I meant, user*doctors.

It's not enough to say "healthteam", you have to qualify it by user too: 
"adamhealthteam".

On 8/10/12 9:02 AM, Adam Fuchs wrote:
>
> I guess I should have specified that the access time labels should be 
> used in conjunction with the role labels, like 
> "(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".
>
> Adam
>
> On Aug 10, 2012 8:56 AM, "Benson Margulies" <bimargulies@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <afuchs@apache.org
>     <ma...@apache.org>> wrote:
>     > Not sure I understand why this gets into n*m roles. Can you
>     elaborate?
>     >
>     > The question of when your physician should have access seems
>     like it could
>     > be represented by just a few labels, like "regularCheckup",
>     > "illnessEvaluation", and "populationStudy". Those labels could
>     then be tied
>     > to an auditing system that could verify appropriateness of
>     access over time.
>
>     And if you change doctors? Maybe that's a job for some sort of
>     role/group model.
>
>
>     >
>     > Adam
>     >
>     > On Aug 9, 2012 10:19 PM, "Josh Elser" <josh.elser@gmail.com
>     <ma...@gmail.com>> wrote:
>     >>
>     >> I've thought quite a bit about the approach you've outlined
>     previously..
>     >>
>     >> The main caveat I've always struggled to overcome is how to
>     encapsulate
>     >> *when* a physician should have access to your records. This
>     expands the
>     >> problem into n*m roles which becomes difficult to manage inside
>     Accumulo,
>     >> especially as time elapses.
>     >>
>     >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>     >>>
>     >>> Just some ideas and thoughts....
>     >>>
>     >>> With a system I'm building I have code to take care of user
>     roles. Roles
>     >>> will define visibilities, how analysis is performed, information
>     >>> sharing, etc. I have a particular role for sharing. I also
>     have an area
>     >>> of interest, usually assigned to a physician role, therefore
>     only a
>     >>> physician's office can see certain data from it. The data
>     corresponding
>     >>> to a given person can be accessed by that person ( if they
>     have app
>     >>> access ), the physician that created it, and other physicians
>     ( with a
>     >>> different area of interest ) with whom the user wants to share
>     their
>     >>> data. Each area of interest will be cryptographically secured. Our
>     >>> approach will utilize multiple crypto technologies. I would
>     suggest
>     >>> making crypto your last stop. Focus on getting
>     >>> the visibility hierarchy designed. HIPAA requirements can come
>     later.
>     >>>
>     >>> In my approach, there is no elevation of fields per se.
>     Instead, there
>     >>> are visibiilities for all assigned parties,so in my case it is
>     a matter
>     >>> of labeling. The data can have hierarchies, and each hierarchy has
>     >>> different labels to control access.
>     >>>
>     >>> " Patient demographic fields are PHI (personal health
>     information) and
>     >>> these should not be visible to all who want to perform
>     analysis, but
>     >>> only to main administrators,
>     >>> patient and maybe physician. I assume these would have to have
>     >>> separate authorization label. "
>     >>>
>     >>> Yes. I think this is where roles will help. Assign roles and
>     >>> visibilities to those roles. As of right now, I'm putting
>     ephemeral data
>     >>> in my visibilities ( user ID for a physician, among other
>     things ). I
>     >>> will probably move this to the qualifier and take a more
>     simple approach
>     >>> to visibilities.
>     >>>
>     >>> Each role has different actions. Right now I have four
>     actions; syncing,
>     >>> querying, deleting, and sharing. You don't have to capture
>     actions, but
>     >>> you might want to limit how the roles of users vary, and I think
>     >>> modeling the security actions within each role is an excellent
>     way to do
>     >>> so.
>     >>>
>     >>>
>     >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli
>     <ebegoli@gmail.com <ma...@gmail.com>
>     >>> <mailto:ebegoli@gmail.com <ma...@gmail.com>>> wrote:
>     >>>
>     >>>     I am trying to model the healthcare claim on accumulo and
>     I want to
>     >>>     lay it out so that it:
>     >>>
>     >>>     A. Accurately reflects the structure of the claim
>     >>>
>     >>>     B. I could have controls finely applied to different
>     sections of the
>     >>>     document
>     >>>
>     >>>     I am simplifying matter but claim contains claim document
>     >>> identifiers,
>     >>>     demographics of the patient, and line items for the procedures
>     >>>     performed:
>     >>>
>     >>>     claim identifier, data submitted, data processed, state of
>     origin,
>     >>> ...
>     >>>     patient name, dob, location, other identifiers
>     >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>     >>>     ...
>     >>>     procedure n code, procedure n provider, procedure n cost, ...
>     >>>
>     >>>
>     >>>     Patient demographic fields are PHI (personal health
>     information) and
>     >>>     these should not be visible to all who want to perform
>     analysis, but
>     >>>     only to main administrators,
>     >>>     patient and maybe physician. I assume these would have to have
>     >>>     separate authorization label.
>     >>>
>     >>>     Other fields may be visible to different groups of people
>     - i.e.
>     >>>     federal claim administrators can see all, but  regional
>     offices can
>     >>>     only see their states.
>     >>>     Separate, more permissive labels.
>     >>>
>     >>>     Finally, it might make sense to "elevate" some fields for
>     easy access
>     >>>     and analysis - ie. diagnostic codes, zip code, cost.
>     >>>     This would not be a matter of labels, but data design.
>     >>>
>     >>>
>     >>>     With all this in mind, I would welcome if anyone has any
>     security and
>     >>>     data design suggestions.
>     >>>
>     >>>
>     >
>

Re: Security and data design advice on structuring data on accumulo

Posted by Adam Fuchs <af...@apache.org>.

I guess I should have specified that the access time labels should be used
in conjunction with the role labels, like
"(adamsHealthTeam&(regularCheckup|illnessEvaluation))|(massStateResearcher&populationStudy)".

Adam
On Aug 10, 2012 8:56 AM, "Benson Margulies" <bi...@gmail.com> wrote:

> On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
> > Not sure I understand why this gets into n*m roles. Can you elaborate?
> >
> > The question of when your physician should have access seems like it
> could
> > be represented by just a few labels, like "regularCheckup",
> > "illnessEvaluation", and "populationStudy". Those labels could then be
> tied
> > to an auditing system that could verify appropriateness of access over
> time.
>
> And if you change doctors? Maybe that's a job for some sort of role/group
> model.
>
>
> >
> > Adam
> >
> > On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
> >>
> >> I've thought quite a bit about the approach you've outlined previously..
> >>
> >> The main caveat I've always struggled to overcome is how to encapsulate
> >> *when* a physician should have access to your records. This expands the
> >> problem into n*m roles which becomes difficult to manage inside
> Accumulo,
> >> especially as time elapses.
> >>
> >> On 8/8/2012 6:29 PM, Marc Parisi wrote:
> >>>
> >>> Just some ideas and thoughts....
> >>>
> >>> With a system I'm building I have code to take care of user roles.
> Roles
> >>> will define visibilities, how analysis is performed, information
> >>> sharing, etc. I have a particular role for sharing. I also have an area
> >>> of interest, usually assigned to a physician role, therefore only a
> >>> physician's office can see certain data from it. The data corresponding
> >>> to a given person can be accessed by that person ( if they have app
> >>> access ), the physician that created it, and other physicians ( with a
> >>> different area of interest ) with whom the user wants to share their
> >>> data. Each area of interest will be cryptographically secured. Our
> >>> approach will utilize multiple crypto technologies. I would suggest
> >>> making crypto your last stop. Focus on getting
> >>> the visibility hierarchy designed. HIPAA requirements can come later.
> >>>
> >>> In my approach, there is no elevation of fields per se. Instead, there
> >>> are visibiilities for all assigned parties,so in my case it is a matter
> >>> of labeling. The data can have hierarchies, and each hierarchy has
> >>> different labels to control access.
> >>>
> >>> " Patient demographic fields are PHI (personal health information) and
> >>> these should not be visible to all who want to perform analysis, but
> >>> only to main administrators,
> >>> patient and maybe physician. I assume these would have to have
> >>> separate authorization label. "
> >>>
> >>> Yes. I think this is where roles will help. Assign roles and
> >>> visibilities to those roles. As of right now, I'm putting ephemeral
> data
> >>> in my visibilities ( user ID for a physician, among other things ). I
> >>> will probably move this to the qualifier and take a more simple
> approach
> >>> to visibilities.
> >>>
> >>> Each role has different actions. Right now I have four actions;
> syncing,
> >>> querying, deleting, and sharing. You don't have to capture actions, but
> >>> you might want to limit how the roles of users vary, and I think
> >>> modeling the security actions within each role is an excellent way to
> do
> >>> so.
> >>>
> >>>
> >>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
> >>> <ma...@gmail.com>> wrote:
> >>>
> >>>     I am trying to model the healthcare claim on accumulo and I want to
> >>>     lay it out so that it:
> >>>
> >>>     A. Accurately reflects the structure of the claim
> >>>
> >>>     B. I could have controls finely applied to different sections of
> the
> >>>     document
> >>>
> >>>     I am simplifying matter but claim contains claim document
> >>> identifiers,
> >>>     demographics of the patient, and line items for the procedures
> >>>     performed:
> >>>
> >>>     claim identifier, data submitted, data processed, state of origin,
> >>> ...
> >>>     patient name, dob, location, other identifiers
> >>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
> >>>     ...
> >>>     procedure n code, procedure n provider, procedure n cost, ...
> >>>
> >>>
> >>>     Patient demographic fields are PHI (personal health information)
> and
> >>>     these should not be visible to all who want to perform analysis,
> but
> >>>     only to main administrators,
> >>>     patient and maybe physician. I assume these would have to have
> >>>     separate authorization label.
> >>>
> >>>     Other fields may be visible to different groups of people - i.e.
> >>>     federal claim administrators can see all, but  regional offices can
> >>>     only see their states.
> >>>     Separate, more permissive labels.
> >>>
> >>>     Finally, it might make sense to "elevate" some fields for easy
> access
> >>>     and analysis - ie. diagnostic codes, zip code, cost.
> >>>     This would not be a matter of labels, but data design.
> >>>
> >>>
> >>>     With all this in mind, I would welcome if anyone has any security
> and
> >>>     data design suggestions.
> >>>
> >>>
> >
>

Re: Security and data design advice on structuring data on accumulo

Posted by Benson Margulies <bi...@gmail.com>.

On Fri, Aug 10, 2012 at 8:52 AM, Adam Fuchs <af...@apache.org> wrote:
> Not sure I understand why this gets into n*m roles. Can you elaborate?
>
> The question of when your physician should have access seems like it could
> be represented by just a few labels, like "regularCheckup",
> "illnessEvaluation", and "populationStudy". Those labels could then be tied
> to an auditing system that could verify appropriateness of access over time.

And if you change doctors? Maybe that's a job for some sort of role/group model.


>
> Adam
>
> On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:
>>
>> I've thought quite a bit about the approach you've outlined previously..
>>
>> The main caveat I've always struggled to overcome is how to encapsulate
>> *when* a physician should have access to your records. This expands the
>> problem into n*m roles which becomes difficult to manage inside Accumulo,
>> especially as time elapses.
>>
>> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>>>
>>> Just some ideas and thoughts....
>>>
>>> With a system I'm building I have code to take care of user roles. Roles
>>> will define visibilities, how analysis is performed, information
>>> sharing, etc. I have a particular role for sharing. I also have an area
>>> of interest, usually assigned to a physician role, therefore only a
>>> physician's office can see certain data from it. The data corresponding
>>> to a given person can be accessed by that person ( if they have app
>>> access ), the physician that created it, and other physicians ( with a
>>> different area of interest ) with whom the user wants to share their
>>> data. Each area of interest will be cryptographically secured. Our
>>> approach will utilize multiple crypto technologies. I would suggest
>>> making crypto your last stop. Focus on getting
>>> the visibility hierarchy designed. HIPAA requirements can come later.
>>>
>>> In my approach, there is no elevation of fields per se. Instead, there
>>> are visibiilities for all assigned parties,so in my case it is a matter
>>> of labeling. The data can have hierarchies, and each hierarchy has
>>> different labels to control access.
>>>
>>> " Patient demographic fields are PHI (personal health information) and
>>> these should not be visible to all who want to perform analysis, but
>>> only to main administrators,
>>> patient and maybe physician. I assume these would have to have
>>> separate authorization label. "
>>>
>>> Yes. I think this is where roles will help. Assign roles and
>>> visibilities to those roles. As of right now, I'm putting ephemeral data
>>> in my visibilities ( user ID for a physician, among other things ). I
>>> will probably move this to the qualifier and take a more simple approach
>>> to visibilities.
>>>
>>> Each role has different actions. Right now I have four actions; syncing,
>>> querying, deleting, and sharing. You don't have to capture actions, but
>>> you might want to limit how the roles of users vary, and I think
>>> modeling the security actions within each role is an excellent way to do
>>> so.
>>>
>>>
>>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>     I am trying to model the healthcare claim on accumulo and I want to
>>>     lay it out so that it:
>>>
>>>     A. Accurately reflects the structure of the claim
>>>
>>>     B. I could have controls finely applied to different sections of the
>>>     document
>>>
>>>     I am simplifying matter but claim contains claim document
>>> identifiers,
>>>     demographics of the patient, and line items for the procedures
>>>     performed:
>>>
>>>     claim identifier, data submitted, data processed, state of origin,
>>> ...
>>>     patient name, dob, location, other identifiers
>>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>>>     ...
>>>     procedure n code, procedure n provider, procedure n cost, ...
>>>
>>>
>>>     Patient demographic fields are PHI (personal health information) and
>>>     these should not be visible to all who want to perform analysis, but
>>>     only to main administrators,
>>>     patient and maybe physician. I assume these would have to have
>>>     separate authorization label.
>>>
>>>     Other fields may be visible to different groups of people - i.e.
>>>     federal claim administrators can see all, but  regional offices can
>>>     only see their states.
>>>     Separate, more permissive labels.
>>>
>>>     Finally, it might make sense to "elevate" some fields for easy access
>>>     and analysis - ie. diagnostic codes, zip code, cost.
>>>     This would not be a matter of labels, but data design.
>>>
>>>
>>>     With all this in mind, I would welcome if anyone has any security and
>>>     data design suggestions.
>>>
>>>
>

Re: Security and data design advice on structuring data on accumulo

Posted by Adam Fuchs <af...@apache.org>.

Not sure I understand why this gets into n*m roles. Can you elaborate?

The question of when your physician should have access seems like it could
be represented by just a few labels, like "regularCheckup",
"illnessEvaluation", and "populationStudy". Those labels could then be tied
to an auditing system that could verify appropriateness of access over time.

Adam
On Aug 9, 2012 10:19 PM, "Josh Elser" <jo...@gmail.com> wrote:

> I've thought quite a bit about the approach you've outlined previously..
>
> The main caveat I've always struggled to overcome is how to encapsulate
> *when* a physician should have access to your records. This expands the
> problem into n*m roles which becomes difficult to manage inside Accumulo,
> especially as time elapses.
>
> On 8/8/2012 6:29 PM, Marc Parisi wrote:
>
>> Just some ideas and thoughts....
>>
>> With a system I'm building I have code to take care of user roles. Roles
>> will define visibilities, how analysis is performed, information
>> sharing, etc. I have a particular role for sharing. I also have an area
>> of interest, usually assigned to a physician role, therefore only a
>> physician's office can see certain data from it. The data corresponding
>> to a given person can be accessed by that person ( if they have app
>> access ), the physician that created it, and other physicians ( with a
>> different area of interest ) with whom the user wants to share their
>> data. Each area of interest will be cryptographically secured. Our
>> approach will utilize multiple crypto technologies. I would suggest
>> making crypto your last stop. Focus on getting
>> the visibility hierarchy designed. HIPAA requirements can come later.
>>
>> In my approach, there is no elevation of fields per se. Instead, there
>> are visibiilities for all assigned parties,so in my case it is a matter
>> of labeling. The data can have hierarchies, and each hierarchy has
>> different labels to control access.
>>
>> " Patient demographic fields are PHI (personal health information) and
>> these should not be visible to all who want to perform analysis, but
>> only to main administrators,
>> patient and maybe physician. I assume these would have to have
>> separate authorization label. "
>>
>> Yes. I think this is where roles will help. Assign roles and
>> visibilities to those roles. As of right now, I'm putting ephemeral data
>> in my visibilities ( user ID for a physician, among other things ). I
>> will probably move this to the qualifier and take a more simple approach
>> to visibilities.
>>
>> Each role has different actions. Right now I have four actions; syncing,
>> querying, deleting, and sharing. You don't have to capture actions, but
>> you might want to limit how the roles of users vary, and I think
>> modeling the security actions within each role is an excellent way to do
>> so.
>>
>>
>> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     I am trying to model the healthcare claim on accumulo and I want to
>>     lay it out so that it:
>>
>>     A. Accurately reflects the structure of the claim
>>
>>     B. I could have controls finely applied to different sections of the
>>     document
>>
>>     I am simplifying matter but claim contains claim document identifiers,
>>     demographics of the patient, and line items for the procedures
>>     performed:
>>
>>     claim identifier, data submitted, data processed, state of origin, ...
>>     patient name, dob, location, other identifiers
>>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>>     ...
>>     procedure n code, procedure n provider, procedure n cost, ...
>>
>>
>>     Patient demographic fields are PHI (personal health information) and
>>     these should not be visible to all who want to perform analysis, but
>>     only to main administrators,
>>     patient and maybe physician. I assume these would have to have
>>     separate authorization label.
>>
>>     Other fields may be visible to different groups of people - i.e.
>>     federal claim administrators can see all, but  regional offices can
>>     only see their states.
>>     Separate, more permissive labels.
>>
>>     Finally, it might make sense to "elevate" some fields for easy access
>>     and analysis - ie. diagnostic codes, zip code, cost.
>>     This would not be a matter of labels, but data design.
>>
>>
>>     With all this in mind, I would welcome if anyone has any security and
>>     data design suggestions.
>>
>>
>>

Re: Security and data design advice on structuring data on accumulo

Posted by Josh Elser <jo...@gmail.com>.

I've thought quite a bit about the approach you've outlined previously..

The main caveat I've always struggled to overcome is how to encapsulate 
*when* a physician should have access to your records. This expands the 
problem into n*m roles which becomes difficult to manage inside 
Accumulo, especially as time elapses.

On 8/8/2012 6:29 PM, Marc Parisi wrote:
> Just some ideas and thoughts....
>
> With a system I'm building I have code to take care of user roles. Roles
> will define visibilities, how analysis is performed, information
> sharing, etc. I have a particular role for sharing. I also have an area
> of interest, usually assigned to a physician role, therefore only a
> physician's office can see certain data from it. The data corresponding
> to a given person can be accessed by that person ( if they have app
> access ), the physician that created it, and other physicians ( with a
> different area of interest ) with whom the user wants to share their
> data. Each area of interest will be cryptographically secured. Our
> approach will utilize multiple crypto technologies. I would suggest
> making crypto your last stop. Focus on getting
> the visibility hierarchy designed. HIPAA requirements can come later.
>
> In my approach, there is no elevation of fields per se. Instead, there
> are visibiilities for all assigned parties,so in my case it is a matter
> of labeling. The data can have hierarchies, and each hierarchy has
> different labels to control access.
>
> " Patient demographic fields are PHI (personal health information) and
> these should not be visible to all who want to perform analysis, but
> only to main administrators,
> patient and maybe physician. I assume these would have to have
> separate authorization label. "
>
> Yes. I think this is where roles will help. Assign roles and
> visibilities to those roles. As of right now, I'm putting ephemeral data
> in my visibilities ( user ID for a physician, among other things ). I
> will probably move this to the qualifier and take a more simple approach
> to visibilities.
>
> Each role has different actions. Right now I have four actions; syncing,
> querying, deleting, and sharing. You don't have to capture actions, but
> you might want to limit how the roles of users vary, and I think
> modeling the security actions within each role is an excellent way to do
> so.
>
>
> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <ebegoli@gmail.com
> <ma...@gmail.com>> wrote:
>
>     I am trying to model the healthcare claim on accumulo and I want to
>     lay it out so that it:
>
>     A. Accurately reflects the structure of the claim
>
>     B. I could have controls finely applied to different sections of the
>     document
>
>     I am simplifying matter but claim contains claim document identifiers,
>     demographics of the patient, and line items for the procedures
>     performed:
>
>     claim identifier, data submitted, data processed, state of origin, ...
>     patient name, dob, location, other identifiers
>     procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>     ...
>     procedure n code, procedure n provider, procedure n cost, ...
>
>
>     Patient demographic fields are PHI (personal health information) and
>     these should not be visible to all who want to perform analysis, but
>     only to main administrators,
>     patient and maybe physician. I assume these would have to have
>     separate authorization label.
>
>     Other fields may be visible to different groups of people - i.e.
>     federal claim administrators can see all, but  regional offices can
>     only see their states.
>     Separate, more permissive labels.
>
>     Finally, it might make sense to "elevate" some fields for easy access
>     and analysis - ie. diagnostic codes, zip code, cost.
>     This would not be a matter of labels, but data design.
>
>
>     With all this in mind, I would welcome if anyone has any security and
>     data design suggestions.
>
>

Re: Security and data design advice on structuring data on accumulo

Posted by Edmon Begoli <eb...@gmail.com>.

Thanks Marc for the ideas and thoughts. Very helpful.

Few more bits of information on my labeling requirements - in my case
there is matrix of permission that
reflects individuals visibility into data, his/her providers rights
and then a whole state/federal hierarchy of permissions.

For example:

John Doe from Tennessee can see his/her data as well as his/her physician.

Doe's state medicare/medicaid administrator can see this PHI data as well.

So does federal medicare or mediaid administrator.

Then, if Tennessee and Georgia have data use agreements, they can see
each others state data.
(this is very much like law enforcement)

In general, there is a matrix of roles.

Btw, I welcome any collaboration on this aspect of privacy research
in the context of Accumulo that would result in open source software
development
and R&D publications.

Regards,
Edmon

On Wed, Aug 8, 2012 at 6:29 PM, Marc Parisi <ma...@accumulo.net> wrote:
> Just some ideas and thoughts....
>
> With a system I'm building I have code to take care of user roles. Roles
> will define visibilities, how analysis is performed, information sharing,
> etc. I have a particular role for sharing. I also have an area of interest,
> usually assigned to a physician role, therefore only a physician's office
> can see certain data from it. The data corresponding to a given person can
> be accessed by that person ( if they have app access ), the physician that
> created it, and other physicians ( with a different area of interest ) with
> whom the user wants to share their data. Each area of interest will be
> cryptographically secured. Our approach will utilize multiple crypto
> technologies. I would suggest making crypto your last stop. Focus on getting
> the visibility hierarchy designed. HIPAA requirements can come later.
>
> In my approach, there is no elevation of fields per se. Instead, there are
> visibiilities for all assigned parties,so in my case it is a matter of
> labeling. The data can have hierarchies, and each hierarchy has different
> labels to control access.
>
> " Patient demographic fields are PHI (personal health information) and
> these should not be visible to all who want to perform analysis, but
> only to main administrators,
> patient and maybe physician. I assume these would have to have
> separate authorization label. "
>
> Yes. I think this is where roles will help. Assign roles and visibilities to
> those roles. As of right now, I'm putting ephemeral data in my visibilities
> ( user ID for a physician, among other things ). I will probably move this
> to the qualifier and take a more simple approach to visibilities.
>
> Each role has different actions. Right now I have four actions; syncing,
> querying, deleting, and sharing. You don't have to capture actions, but you
> might want to limit how the roles of users vary, and I think modeling the
> security actions within each role is an excellent way to do so.
>
>
> On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <eb...@gmail.com> wrote:
>>
>> I am trying to model the healthcare claim on accumulo and I want to
>> lay it out so that it:
>>
>> A. Accurately reflects the structure of the claim
>>
>> B. I could have controls finely applied to different sections of the
>> document
>>
>> I am simplifying matter but claim contains claim document identifiers,
>> demographics of the patient, and line items for the procedures
>> performed:
>>
>> claim identifier, data submitted, data processed, state of origin, ...
>> patient name, dob, location, other identifiers
>> procedure 1 code, procedure 1 provider, procedure 1 cost, ...
>> ...
>> procedure n code, procedure n provider, procedure n cost, ...
>>
>>
>> Patient demographic fields are PHI (personal health information) and
>> these should not be visible to all who want to perform analysis, but
>> only to main administrators,
>> patient and maybe physician. I assume these would have to have
>> separate authorization label.
>>
>> Other fields may be visible to different groups of people - i.e.
>> federal claim administrators can see all, but  regional offices can
>> only see their states.
>> Separate, more permissive labels.
>>
>> Finally, it might make sense to "elevate" some fields for easy access
>> and analysis - ie. diagnostic codes, zip code, cost.
>> This would not be a matter of labels, but data design.
>>
>>
>> With all this in mind, I would welcome if anyone has any security and
>> data design suggestions.
>
>

Re: Security and data design advice on structuring data on accumulo

Posted by Marc Parisi <ma...@accumulo.net>.

Just some ideas and thoughts....

With a system I'm building I have code to take care of user roles. Roles
will define visibilities, how analysis is performed, information sharing,
etc. I have a particular role for sharing. I also have an area of interest,
usually assigned to a physician role, therefore only a physician's office
can see certain data from it. The data corresponding to a given person can
be accessed by that person ( if they have app access ), the physician that
created it, and other physicians ( with a different area of interest ) with
whom the user wants to share their data. Each area of interest will be
cryptographically secured. Our approach will utilize multiple crypto
technologies. I would suggest making crypto your last stop. Focus on
getting the visibility hierarchy designed. HIPAA requirements can come
later.

In my approach, there is no elevation of fields per se. Instead, there are
visibiilities for all assigned parties,so in my case it is a matter of
labeling. The data can have hierarchies, and each hierarchy has different
labels to control access.

" Patient demographic fields are PHI (personal health information) and
these should not be visible to all who want to perform analysis, but
only to main administrators,
patient and maybe physician. I assume these would have to have
separate authorization label. "

Yes. I think this is where roles will help. Assign roles and visibilities
to those roles. As of right now, I'm putting ephemeral data in my
visibilities ( user ID for a physician, among other things ). I will
probably move this to the qualifier and take a more simple approach to
visibilities.

Each role has different actions. Right now I have four actions; syncing,
querying, deleting, and sharing. You don't have to capture actions, but you
might want to limit how the roles of users vary, and I think modeling the
security actions within each role is an excellent way to do so.

On Wed, Aug 8, 2012 at 4:08 PM, Edmon Begoli <eb...@gmail.com> wrote:

> I am trying to model the healthcare claim on accumulo and I want to
> lay it out so that it:
>
> A. Accurately reflects the structure of the claim
>
> B. I could have controls finely applied to different sections of the
> document
>
> I am simplifying matter but claim contains claim document identifiers,
> demographics of the patient, and line items for the procedures
> performed:
>
> claim identifier, data submitted, data processed, state of origin, ...
> patient name, dob, location, other identifiers
> procedure 1 code, procedure 1 provider, procedure 1 cost, ...
> ...
> procedure n code, procedure n provider, procedure n cost, ...
>
>
> Patient demographic fields are PHI (personal health information) and
> these should not be visible to all who want to perform analysis, but
> only to main administrators,
> patient and maybe physician. I assume these would have to have
> separate authorization label.
>
> Other fields may be visible to different groups of people - i.e.
> federal claim administrators can see all, but  regional offices can
> only see their states.
> Separate, more permissive labels.
>
> Finally, it might make sense to "elevate" some fields for easy access
> and analysis - ie. diagnostic codes, zip code, cost.
> This would not be a matter of labels, but data design.
>
>
> With all this in mind, I would welcome if anyone has any security and
> data design suggestions.
>