You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by aravinth thangasami <ar...@gmail.com> on 2017/12/04 17:57:35 UTC

Encryption At Rest - Using CustomAnalyzer

Hi all,

To support Encryption at Rest, We have written a custom analyzer, that
encrypts every token in the Input string and proceeds to the default
indexing chain

We are using AES/CTR/NoPadding with unique Key Per User.
This helps that the input string with common prefix, the encrypted strings
will also get common prefix
So that we can perform Prefix Query also.

For example,

run           x5X7
runs  x5X7tg==
running x5X7q/nE5g==


During searching, we will preprocess the query for encrypted Field before
searching
we can't do  WildCard & Fuzzy Query


Did anyone try this approach?
Please post your suggestions and your tried approaches


Thanks
Aravinth

Re: Encryption At Rest - Using CustomAnalyzer

Posted by András Péteri <ap...@b2international.com>.
Hi Avarinth,

There is an open issue to encrypt index files using AES, don't know if
that would fit your requirements:
https://issues.apache.org/jira/browse/LUCENE-2228

Regards,
András

On Tue, Feb 6, 2018 at 8:32 AM, Michael Wilkowski <mw...@silenteight.com> wrote:
> Hi,
> sorry to say that, but your encryption is not secure at all. Actually it is
> very weak. Since you encrypt tokens only (and apply padding) then it is
> very easy based on the examples above to actually reverse engineer your
> text. If somebody understands the domain, has text distribution and may
> build so-called word2vec then he/she may easily use it to build a reverse
> dictionary of your tokens.
>
> On the other hand: this means that actually it should not be so difficult
> to build wildcard queries (at least with asterisk at the end, not at the
> beginning of the word). Check how fuzzy query works right now - it is query
> easy to understand and straightforward when looking in source code. I built
> my own version of FuzzyQuery some time ago based on MultiTermQuery class.
>
> MW
>
>
>
>
> [image: photo]
> *Michael Wilkowski*
> Chief Technology Officer, Silent Eight Pte Ltd
>
> +48 600 995 603 | mw@silenteight.com
>
> www.silenteight.com
> Get your own email signature
> <https://wisestamp.com/email-install?utm_source=promotion&utm_medium=signature&utm_campaign=get_your_own>
>
> On Tue, Feb 6, 2018 at 3:42 AM, aravinth thangasami <
> aravinththangasami@gmail.com> wrote:
>
>> Kindly post your suggestions.
>>
>>
>>
>> On Mon, Dec 4, 2017 at 11:27 PM, aravinth thangasami <
>> aravinththangasami@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > To support Encryption at Rest, We have written a custom analyzer, that
>> > encrypts every token in the Input string and proceeds to the default
>> > indexing chain
>> >
>> > We are using AES/CTR/NoPadding with unique Key Per User.
>> > This helps that the input string with common prefix, the encrypted
>> strings
>> > will also get common prefix
>> > So that we can perform Prefix Query also.
>> >
>> > For example,
>> >
>> > run           x5X7
>> > runs  x5X7tg==
>> > running x5X7q/nE5g==
>> >
>> >
>> > During searching, we will preprocess the query for encrypted Field before
>> > searching
>> > we can't do  WildCard & Fuzzy Query
>> >
>> >
>> > Did anyone try this approach?
>> > Please post your suggestions and your tried approaches
>> >
>> >
>> > Thanks
>> > Aravinth
>> >
>> >
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Encryption At Rest - Using CustomAnalyzer

Posted by Michael Wilkowski <mw...@silenteight.com>.
Hi,
sorry to say that, but your encryption is not secure at all. Actually it is
very weak. Since you encrypt tokens only (and apply padding) then it is
very easy based on the examples above to actually reverse engineer your
text. If somebody understands the domain, has text distribution and may
build so-called word2vec then he/she may easily use it to build a reverse
dictionary of your tokens.

On the other hand: this means that actually it should not be so difficult
to build wildcard queries (at least with asterisk at the end, not at the
beginning of the word). Check how fuzzy query works right now - it is query
easy to understand and straightforward when looking in source code. I built
my own version of FuzzyQuery some time ago based on MultiTermQuery class.

MW




[image: photo]
*Michael Wilkowski*
Chief Technology Officer, Silent Eight Pte Ltd

+48 600 995 603 | mw@silenteight.com

www.silenteight.com
Get your own email signature
<https://wisestamp.com/email-install?utm_source=promotion&utm_medium=signature&utm_campaign=get_your_own>

On Tue, Feb 6, 2018 at 3:42 AM, aravinth thangasami <
aravinththangasami@gmail.com> wrote:

> Kindly post your suggestions.
>
>
>
> On Mon, Dec 4, 2017 at 11:27 PM, aravinth thangasami <
> aravinththangasami@gmail.com> wrote:
>
> > Hi all,
> >
> > To support Encryption at Rest, We have written a custom analyzer, that
> > encrypts every token in the Input string and proceeds to the default
> > indexing chain
> >
> > We are using AES/CTR/NoPadding with unique Key Per User.
> > This helps that the input string with common prefix, the encrypted
> strings
> > will also get common prefix
> > So that we can perform Prefix Query also.
> >
> > For example,
> >
> > run           x5X7
> > runs  x5X7tg==
> > running x5X7q/nE5g==
> >
> >
> > During searching, we will preprocess the query for encrypted Field before
> > searching
> > we can't do  WildCard & Fuzzy Query
> >
> >
> > Did anyone try this approach?
> > Please post your suggestions and your tried approaches
> >
> >
> > Thanks
> > Aravinth
> >
> >
> >
> >
>

Re: Encryption At Rest - Using CustomAnalyzer

Posted by aravinth thangasami <ar...@gmail.com>.
Kindly post your suggestions.



On Mon, Dec 4, 2017 at 11:27 PM, aravinth thangasami <
aravinththangasami@gmail.com> wrote:

> Hi all,
>
> To support Encryption at Rest, We have written a custom analyzer, that
> encrypts every token in the Input string and proceeds to the default
> indexing chain
>
> We are using AES/CTR/NoPadding with unique Key Per User.
> This helps that the input string with common prefix, the encrypted strings
> will also get common prefix
> So that we can perform Prefix Query also.
>
> For example,
>
> run           x5X7
> runs  x5X7tg==
> running x5X7q/nE5g==
>
>
> During searching, we will preprocess the query for encrypted Field before
> searching
> we can't do  WildCard & Fuzzy Query
>
>
> Did anyone try this approach?
> Please post your suggestions and your tried approaches
>
>
> Thanks
> Aravinth
>
>
>
>