You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Christopher Tubbs (Jira)" <ji...@apache.org> on 2020/10/28 22:46:00 UTC

[jira] [Resolved] (ACCUMULO-3970) Generating multiple views of a value at scan time

     [ https://issues.apache.org/jira/browse/ACCUMULO-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Tubbs resolved ACCUMULO-3970.
-----------------------------------------
    Resolution: Won't Fix

Closing this stale issue. If this is still a problem, please open a new issue or PR at https://github.com/apache/accumulo

Also, this can be done by storing different representations, or scanning with a proxy user who is authorized to view the data and manipulate it in an iterator at scan time.

> Generating multiple views of a value at scan time
> -------------------------------------------------
>
>                 Key: ACCUMULO-3970
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3970
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Russ Weeks
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It would be useful to have the ability to generate different representations of a key-value pair at scan time, based on the scan authorizations.
> For example, consider [HIPPA safe harbour de-identification|http://www.hhs.gov/ocr/privacy/hipaa/understanding/coveredentities/De-identification/guidance.html#dates]. One of the rules for de-identifying a patient's date of birth is that if a patient is 89 years old or younger, you can disclose his exact year of birth. If a patient is 90 years old or over, you pretend that he's 90 years old.
> You can imagine implementing this as a key/value mapping in accumulo like,
> {{(pt_id, demographic, pt_dob, PII_DOB) -> "1925-08-22"}}
> {{(pt_id, demographic, pt_dob, SHD_DOB) -> "1925"}}
> Where the value corresponding to visibility SHD_DOB is produced at scan-time, depending on the patient's current age.
> Another example would be the ability to produce a salted hash of a unique identifier like a social security number or medical record number, where the salt (or the hash algorithm, or the work factor...) could be specified dynamically without having to re-code all the values in the system.
> More broadly speaking, this feature would give organizations more flexibility to change how they deidentify, transform or anonymize data to suit different access levels.
> Of course, to do this you'd need to have a pluggable component that can process key/value pairs before visibilities are evaluated. I can see why this might give a lot of people the heeby-jeebies but I'd like to gather as much feedback as possible. Looking forward to hearing your thoughts!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)