You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2013/09/19 01:54:51 UTC

[jira] [Commented] (HBASE-9578) Client side cell encryption

    [ https://issues.apache.org/jira/browse/HBASE-9578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771415#comment-13771415 ] 

Andrew Purtell commented on HBASE-9578:
---------------------------------------

First, a HTable wrapper but definition must be explicitly used by an application. It is an easy solution to implement but is not transparent to the end user. Before HBase 0.95/0.96, with its new RPC codecs, a wrapper was the only implementation choice that avoids invasive changes to the client library. After 0.95/0.96, RPC codecs offer an interesting option for adding client side value encryption (and/or compression) in a more transparent way.

Second, HBase is completely agnostic about value data but not so about keys. Traditional encryption if applied to key data as well would destroy data locality and scan semantics. There are some "deterministic encryption" schemes which would maintain a sort ordering but at the price of increased exposure to successful cryptanalysis. 

Next, and related to the trouble with keys and sorting, if all cryptographic transformations are performed entirely on the client side, as a consequence the encrypted data cannot be transformed into plaintext on the server, so several HBase API operations become impossible: append, increment, checkAndPut, checkAndDelete, and any scan filter that wants to examine cell values. For analytical workloads (short scans with highly selective filters, aggregating coprocessors) in particular this requires transferring much more data to the client for processing there than would otherwise be needed.

We could consider sending private key material over in the RPC to work around this problem, but it is risky to ship private key material over the network ever, never mind frequently. So let's consider what can be done on the server as much as practical without sending over user private key material.

A naive option would be to implement a fully homomorphic encryption scheme. In theory, any operation on the server over encrypted data would be possible. Unfortunately fully homomorphic encryption in practice imposes overheads on the order of 10^9. There are however some practical but more limited schemes which may be useful.

At VLDB 2013, MIT CSAIL presented the paper "Processing Analytical Queries over Encrypted Data" which describes a research prototype, based on Postgres, capable of mixed operations over encrypted data client and server side. They employ encryption schemes applicable as well for restoring the HBase API operations mentioned above. "Deterministic encryption" with AES would make equality tests possible, restoring checkAndX operations, if we accept the leakage resulting from duplicates. Deterministic transformations also restore Append. OPE encryption can restore range scanning semantics, but with greater leakage leading to practical partial plaintext recovery. Maybe for some that would be an acceptable tradeoff. More interesting, Paillier homomorphic encryption supports addition, therefore summation, and could restore Increments and aggregating coprocessor functions like sum(). We might support some subset of scanning with filters by rewriting the filters passed in for a Scan with encryption-aware substitutions.

Of course, there is the problem of encrypting the data at the client with the correct scheme for the wanted semantics. The full design burden could be pushed to the user. Better, the mentioned paper describes a table designer executed at data import time to choose the optimal physical layout for the desired schema. Something like that could be developed for HBase as well leveraging the typed data library.
                
> Client side cell encryption
> ---------------------------
>
>                 Key: HBASE-9578
>                 URL: https://issues.apache.org/jira/browse/HBASE-9578
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>
> HBASE-7544 will protect key and value data on the server from accidental leakage by way of improperly disposed disks, improper direct filesystem access, or incorrect HDFS permissions. There are also use cases where sensitive data stored in a table or column family by a given user or application should be protected from all others, and the combination of transparent server-side storage encryption and transport security (SASL auth-conf) is still not sufficient. These instances call for a client side per-cell encryption feature, given the following additional observations:
> - The scope of transmission, distribution, and storage of private key material should be as limited as possible. The server is a centralized target (even in the case of an HBase cluster) where the scope of damage from a compromise is magnified if user key material also resides there or can be intercepted after compromise. Where keys are stored in hardware devices, e.g. smartcards, getting the keys to the server may be not possible anyway.
> - A client system is far more likely than a contended shared server resource to have necessary available CPU cycles for per-operation cryptographic overheads.
> For some cases we might not care so much about the second item, but the first is very important.
> I have an implementation of per cell client side encryption as an encrypting HTable wrapper which I could contribute if there is interest.
> This JIRA is also about brainstorming how to do better than that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira