You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Fabio Germann (Jira)" <ji...@apache.org> on 2021/06/03 14:53:00 UTC

[jira] [Comment Edited] (LUCENE-9379) Directory based approach for index encryption

    [ https://issues.apache.org/jira/browse/LUCENE-9379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17356487#comment-17356487 ] 

Fabio Germann edited comment on LUCENE-9379 at 6/3/21, 2:52 PM:
----------------------------------------------------------------

Thanks [~broustant]/[~bruno.roustant], this is also something that I was looking for!

As for [~rcmuir]'s comment(s): I think the important distinction to be made is the goal of the usage of encryption and the guarantees you need.

If one needs tenant based encryption at rest, os level encryption is a valid way to go. Also if one needs maximum performance and tries to squeeze every last drop of performance out of their NVMe's - os level encryption (or no encryption) would probably be best.

BUT: In todays world there are sometimes things that are more important (or pose a greater risk) to a project or a company: namely user privacy and data protection. In such cases decreased performance is certainly acceptable (if not already anticipated).

Many of the above arguments against this contribution can be addressed one way or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution is valuable) is:
 * It allows for the stored content to only be accessible to Lucene (the process/thread), for the exact duration that Lucene needs to process the data, without any dependency on a downstream component.
 * It allows for platform interoperability/independence. (Example: ) This allows the solution to be deployed to Linux system, while being developed on MacOS/Windows. (Sidenote: This is very important if there are large teams working on solution building on this.)
 * It can even offer protection from passive privileged users - meaning that the file on the filesystem is not readable for a privileged user. In contrast to that the os-level encryption that would make such protections more complex.
 * It allows for simple deployment in container technologies (which would be tricky with the alternatives proposed by [~rcmuir])

 

Maybe the increased interest in this topic signals that there is something to be done?

Also recent research has taken note - like: 
 (From the abstract: ) "[...] However, currently deployed IR technologies, e.g., Apache Lucene - open-source search software, are insufficient when the information is protected or deemed to be private [...]"
 (Source: [https://www.computer.org/csdl/journal/tq/5555/01/08954811/1gs4XOshKHC)] 


was (Author: fabio.germann):
Thanks [~broustant]/[~bruno.roustant], this is also something that I was looking for!

As for [~rcmuir]'s comment(s): I think the important distinction to be made is the goal of the usage of encryption and the guarantees you need.

If one needs tenant based encryption at rest, os level encryption is a valid way to go. Also if one needs maximum performance and tries to squeeze every last drop of performance out of their NVMe's - os level encryption (or no encryption) would probably be best.

BUT: In todays world there are sometimes things that are more important (or pose a greater risk) to a project or a company: namely user privacy and data protection. In such cases decreased performance is certainly acceptable (if not already anticipated).

Many of the above arguments against this contribution can be addressed one way or another. What can NOT be addressed (and why [~bruno.roustant]'s contribution is valuable) is:
 * It allows for the stored content to only be accessible to Lucene (the process/thread), for the exact duration that Lucene needs to process the data, without any dependency on a downstream component.
 * It allows for platform interoperability/independence. (Example:) This allows the solution to be deployed to Linux system, while being developed on MacOS/Windows. (Sidenote: This is very important if there are large teams working on solution building on this.)
 * It can even offer protection from passive privileged users - meaning that the file on the filesystem is not readable for a privileged user. In contrast to that the os-level encryption that would make such protections more complex.
 * It allows for simple deployment in container technologies (which would be tricky with the alternatives proposed by [~rcmuir])

 

Maybe the increased interest in this topic signals that there is something to be done?

Also recent research has taken note - like: 
(From the abstract:) "[...] However, currently deployed IR technologies, e.g., Apache Lucene - open-source search software, are insufficient when the information is protected or deemed to be private [...]"
(Source: [https://www.computer.org/csdl/journal/tq/5555/01/08954811/1gs4XOshKHC)] 

> Directory based approach for index encryption
> ---------------------------------------------
>
>                 Key: LUCENE-9379
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9379
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Bruno Roustant
>            Assignee: Bruno Roustant
>            Priority: Major
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> +Important+: This Lucene Directory wrapper approach is to be considered only if an OS level encryption is not possible. OS level encryption better fits Lucene usage of OS cache, and thus is more performant.
> But there are some use-case where OS level encryption is not possible. This Jira issue was created to address those.
> ____________________________________________
>  
> The goal is to provide optional encryption of the index, with a scope limited to an encryptable Lucene Directory wrapper.
> Encryption is at rest on disk, not in memory.
> This simple approach should fit any Codec as it would be orthogonal, without modifying APIs as much as possible.
> Use a standard encryption method. Limit perf/memory impact as much as possible.
> Determine how callers provide encryption keys. They must not be stored on disk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org