You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kumaran Ramasubramanian <ku...@gmail.com> on 2017/08/07 06:30:52 UTC

Encryption at lucene index

Hi All,


After looking at all below discussions, i have one doubt which may be silly
or novice but i want to throw this to lucene user list.

if we have encryption layer included in our analyzer's flow of filters like
EncryptionFilter to control field-level encryption. what are the
consequences ? am i missing anything basic?

Thanks in advance..


Related links:

https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted Directory
- in lucene 3.x

https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for index-level
encryption - at codec level, to have control on which column / field have
 personal identifiable information

https://security.stackexchange.com/questions/111153/is-a-lucene-search-index-effectively-a-backdoor-for-field-level-encryption


A decent encrypting algorithm will not produce, say, the same first portion
> for two tokens that start with the same letters. So wildcard searches won't
> work. Consider "runs", "running", "runner". A search on "run*" would be
> expected to match all three, but wouldn't unless the encryption were so
> trivial as to be useless. Similar issues arise with sorting. "More Like
> This" would be unreliable. There are many other features of a robust search
> engine that would be impacted, and an index with encrypted terms would be
> useful for only exact matches, which usually results in a poor search
> experience.


https://stackoverflow.com/questions/36604551/adding-encryption-to-solr-lucene-indexes






--
Kumaran R

Re: Encryption at lucene index

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
I got it Erick.. Thank you..

-
​Kumaran R
​

On Fri, Aug 11, 2017 at 10:35 PM, Erick Erickson <er...@gmail.com>
wrote:

> Encrypting the _tokens_ inevitably leads to reduced capabilities BTW.
> Trivial example:
> I have these tokens in my index
> run
> runner
> running
> runs
>
> Any non-trivial encryption algorithm will not encrypt the first three
> letters "run" identically in all three so searching for run* simply
> won't work.
>
> As you can see, there's quite a bit of back-and-forth with that JIRA
> and it is pretty much been abandoned.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 11:17 PM, Kumaran Ramasubramanian
> <ku...@gmail.com> wrote:
> > Hi Ishan, thank you :-)
> >
> > -
> > -
> > Kumaran R
> >
> >
> >
> > On Mon, Aug 7, 2017 at 10:53 PM, Ishan Chattopadhyaya <
> > ichattopadhyaya@gmail.com> wrote:
> >
> >> Harry Ochiai (Hitachi) has some index encryption solution,
> >> https://www.slideshare.net/maggon/securing-solr-search-
> data-in-the-cloud
> >> I think it is proprietary, but I'm not sure. Maybe more googling might
> help
> >> find the exact page where his solution is described.
> >>
> >> On Mon, Aug 7, 2017 at 9:59 PM, Kumaran Ramasubramanian <
> >> kums.134@gmail.com>
> >> wrote:
> >>
> >> > Hi Erick, i want to encrypt some fields of an document which has
> personal
> >> > identifiable information ( both indexed and stored data)... for eg:
> >> email,
> >> > mobilenumber etc.. i am able to find LUCENE-6966 alone while googling
> >> it..
> >> > any related pointers in solr or latest lucene version?
> >> >
> >> >
> >> > -
> >> > -
> >> > Kumaran R
> >> >
> >> > On Mon, Aug 7, 2017 at 9:52 PM, Erick Erickson <
> erickerickson@gmail.com>
> >> > wrote:
> >> >
> >> > > No, since you haven't defined what you want to encrypt, what your
> >> > > requirements are, what you hope to get out of "encryption" etc.
> >> > >
> >> > > Put the index on an encrypting filesystem and forget about it if you
> >> > > possibly can, because anything else is a significant amount of work.
> >> > > To encrypt the searchable tokens on a per-user basis in memory is a
> >> > > _lot_ of work. It depends on your security needs.
> >> > >
> >> > > Otherwise, as I said, please ask specific questions as the topic is
> >> > > quite large, much too large to conduct a seminar through the user's
> >> > > list.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Mon, Aug 7, 2017 at 9:07 AM, Kumaran Ramasubramanian
> >> > > <ku...@gmail.com> wrote:
> >> > > > Hi Erick,
> >> > > >
> >> > > >     Thanks for the information. Any pointers about encryption
> options
> >> > in
> >> > > > solr?
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Kumaran R
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <
> >> > erickerickson@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > >> Encryption in Solr has a bunch of ramifications. Do you care
> about
> >> > > >>
> >> > > >> - encryption at rest or in memory?
> >> > > >> - encrypting the _searchable_ tokens?
> >> > > >> - encrypting the searchable tokens per-user?
> >> > > >> - encrypting the stored data (which a filter won't do BTW).
> >> > > >>
> >> > > >> It's actually a fairly complex topic the discussion at
> LUCENE-6966
> >> > > >> outlines much of it. Please ask specific questions as you
> research
> >> the
> >> > > >> topic. One  per-user encryption package that I know of is by
> Hitachi
> >> > > >> Solutions (commercial) and it explicitly does _not_ support, for
> >> > > >> instance, wildcards (there are other limitations too). See:
> >> > > >> http://www.hitachi-solutions.com/securesearch/
> >> > > >>
> >> > > >> Most of the time when people ask for encryption they soon
> discover
> >> > > >> it's much more difficult than they imagine and settle for just
> >> putting
> >> > > >> the indexes on an encrypting file system. When they move beyond
> that
> >> > > >> it gets complex and you'd be well advised to consult with Solr
> >> > > >> security experts.
> >> > > >>
> >> > > >> Best,
> >> > > >> Erick
> >> > > >>
> >> > > >> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
> >> > > >> <ku...@gmail.com> wrote:
> >> > > >> > Hi All,
> >> > > >> >
> >> > > >> >
> >> > > >> > After looking at all below discussions, i have one doubt which
> may
> >> > be
> >> > > >> silly
> >> > > >> > or novice but i want to throw this to lucene user list.
> >> > > >> >
> >> > > >> > if we have encryption layer included in our analyzer's flow of
> >> > filters
> >> > > >> like
> >> > > >> > EncryptionFilter to control field-level encryption. what are
> the
> >> > > >> > consequences ? am i missing anything basic?
> >> > > >> >
> >> > > >> > Thanks in advance..
> >> > > >> >
> >> > > >> >
> >> > > >> > Related links:
> >> > > >> >
> >> > > >> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES
> Encrypted
> >> > > >> Directory
> >> > > >> > - in lucene 3.x
> >> > > >> >
> >> > > >> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
> >> > > >> index-level
> >> > > >> > encryption - at codec level, to have control on which column /
> >> field
> >> > > have
> >> > > >> >  personal identifiable information
> >> > > >> >
> >> > > >> > https://security.stackexchange.com/questions/
> >> > > 111153/is-a-lucene-search-
> >> > > >> index-effectively-a-backdoor-for-field-level-encryption
> >> > > >> >
> >> > > >> >
> >> > > >> > A decent encrypting algorithm will not produce, say, the same
> >> first
> >> > > >> portion
> >> > > >> >> for two tokens that start with the same letters. So wildcard
> >> > searches
> >> > > >> won't
> >> > > >> >> work. Consider "runs", "running", "runner". A search on "run*"
> >> > would
> >> > > be
> >> > > >> >> expected to match all three, but wouldn't unless the
> encryption
> >> > were
> >> > > so
> >> > > >> >> trivial as to be useless. Similar issues arise with sorting.
> >> "More
> >> > > Like
> >> > > >> >> This" would be unreliable. There are many other features of a
> >> > robust
> >> > > >> search
> >> > > >> >> engine that would be impacted, and an index with encrypted
> terms
> >> > > would
> >> > > >> be
> >> > > >> >> useful for only exact matches, which usually results in a poor
> >> > search
> >> > > >> >> experience.
> >> > > >> >
> >> > > >> >
> >> > > >> > https://stackoverflow.com/questions/36604551/adding-
> >> > > >> encryption-to-solr-lucene-indexes
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> > --
> >> > > >> > Kumaran R
> >> > > >>
> >> > > >> ------------------------------------------------------------
> >> ---------
> >> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > >> For additional commands, e-mail: java-user-help@lucene.apache.
> org
> >> > > >>
> >> > > >>
> >> > >
> >> > > ------------------------------------------------------------
> ---------
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >
> >> > >
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Encryption at lucene index

Posted by Erick Erickson <er...@gmail.com>.
Encrypting the _tokens_ inevitably leads to reduced capabilities BTW.
Trivial example:
I have these tokens in my index
run
runner
running
runs

Any non-trivial encryption algorithm will not encrypt the first three
letters "run" identically in all three so searching for run* simply
won't work.

As you can see, there's quite a bit of back-and-forth with that JIRA
and it is pretty much been abandoned.

Best,
Erick

On Thu, Aug 10, 2017 at 11:17 PM, Kumaran Ramasubramanian
<ku...@gmail.com> wrote:
> Hi Ishan, thank you :-)
>
> -
> -
> Kumaran R
>
>
>
> On Mon, Aug 7, 2017 at 10:53 PM, Ishan Chattopadhyaya <
> ichattopadhyaya@gmail.com> wrote:
>
>> Harry Ochiai (Hitachi) has some index encryption solution,
>> https://www.slideshare.net/maggon/securing-solr-search-data-in-the-cloud
>> I think it is proprietary, but I'm not sure. Maybe more googling might help
>> find the exact page where his solution is described.
>>
>> On Mon, Aug 7, 2017 at 9:59 PM, Kumaran Ramasubramanian <
>> kums.134@gmail.com>
>> wrote:
>>
>> > Hi Erick, i want to encrypt some fields of an document which has personal
>> > identifiable information ( both indexed and stored data)... for eg:
>> email,
>> > mobilenumber etc.. i am able to find LUCENE-6966 alone while googling
>> it..
>> > any related pointers in solr or latest lucene version?
>> >
>> >
>> > -
>> > -
>> > Kumaran R
>> >
>> > On Mon, Aug 7, 2017 at 9:52 PM, Erick Erickson <er...@gmail.com>
>> > wrote:
>> >
>> > > No, since you haven't defined what you want to encrypt, what your
>> > > requirements are, what you hope to get out of "encryption" etc.
>> > >
>> > > Put the index on an encrypting filesystem and forget about it if you
>> > > possibly can, because anything else is a significant amount of work.
>> > > To encrypt the searchable tokens on a per-user basis in memory is a
>> > > _lot_ of work. It depends on your security needs.
>> > >
>> > > Otherwise, as I said, please ask specific questions as the topic is
>> > > quite large, much too large to conduct a seminar through the user's
>> > > list.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > On Mon, Aug 7, 2017 at 9:07 AM, Kumaran Ramasubramanian
>> > > <ku...@gmail.com> wrote:
>> > > > Hi Erick,
>> > > >
>> > > >     Thanks for the information. Any pointers about encryption options
>> > in
>> > > > solr?
>> > > >
>> > > >
>> > > > --
>> > > > Kumaran R
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <
>> > erickerickson@gmail.com>
>> > > > wrote:
>> > > >
>> > > >> Encryption in Solr has a bunch of ramifications. Do you care about
>> > > >>
>> > > >> - encryption at rest or in memory?
>> > > >> - encrypting the _searchable_ tokens?
>> > > >> - encrypting the searchable tokens per-user?
>> > > >> - encrypting the stored data (which a filter won't do BTW).
>> > > >>
>> > > >> It's actually a fairly complex topic the discussion at LUCENE-6966
>> > > >> outlines much of it. Please ask specific questions as you research
>> the
>> > > >> topic. One  per-user encryption package that I know of is by Hitachi
>> > > >> Solutions (commercial) and it explicitly does _not_ support, for
>> > > >> instance, wildcards (there are other limitations too). See:
>> > > >> http://www.hitachi-solutions.com/securesearch/
>> > > >>
>> > > >> Most of the time when people ask for encryption they soon discover
>> > > >> it's much more difficult than they imagine and settle for just
>> putting
>> > > >> the indexes on an encrypting file system. When they move beyond that
>> > > >> it gets complex and you'd be well advised to consult with Solr
>> > > >> security experts.
>> > > >>
>> > > >> Best,
>> > > >> Erick
>> > > >>
>> > > >> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
>> > > >> <ku...@gmail.com> wrote:
>> > > >> > Hi All,
>> > > >> >
>> > > >> >
>> > > >> > After looking at all below discussions, i have one doubt which may
>> > be
>> > > >> silly
>> > > >> > or novice but i want to throw this to lucene user list.
>> > > >> >
>> > > >> > if we have encryption layer included in our analyzer's flow of
>> > filters
>> > > >> like
>> > > >> > EncryptionFilter to control field-level encryption. what are the
>> > > >> > consequences ? am i missing anything basic?
>> > > >> >
>> > > >> > Thanks in advance..
>> > > >> >
>> > > >> >
>> > > >> > Related links:
>> > > >> >
>> > > >> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted
>> > > >> Directory
>> > > >> > - in lucene 3.x
>> > > >> >
>> > > >> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
>> > > >> index-level
>> > > >> > encryption - at codec level, to have control on which column /
>> field
>> > > have
>> > > >> >  personal identifiable information
>> > > >> >
>> > > >> > https://security.stackexchange.com/questions/
>> > > 111153/is-a-lucene-search-
>> > > >> index-effectively-a-backdoor-for-field-level-encryption
>> > > >> >
>> > > >> >
>> > > >> > A decent encrypting algorithm will not produce, say, the same
>> first
>> > > >> portion
>> > > >> >> for two tokens that start with the same letters. So wildcard
>> > searches
>> > > >> won't
>> > > >> >> work. Consider "runs", "running", "runner". A search on "run*"
>> > would
>> > > be
>> > > >> >> expected to match all three, but wouldn't unless the encryption
>> > were
>> > > so
>> > > >> >> trivial as to be useless. Similar issues arise with sorting.
>> "More
>> > > Like
>> > > >> >> This" would be unreliable. There are many other features of a
>> > robust
>> > > >> search
>> > > >> >> engine that would be impacted, and an index with encrypted terms
>> > > would
>> > > >> be
>> > > >> >> useful for only exact matches, which usually results in a poor
>> > search
>> > > >> >> experience.
>> > > >> >
>> > > >> >
>> > > >> > https://stackoverflow.com/questions/36604551/adding-
>> > > >> encryption-to-solr-lucene-indexes
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > --
>> > > >> > Kumaran R
>> > > >>
>> > > >> ------------------------------------------------------------
>> ---------
>> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > > >>
>> > > >>
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Encryption at lucene index

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
Hi Ishan, thank you :-)

-
​-
Kumaran R

​

On Mon, Aug 7, 2017 at 10:53 PM, Ishan Chattopadhyaya <
ichattopadhyaya@gmail.com> wrote:

> Harry Ochiai (Hitachi) has some index encryption solution,
> https://www.slideshare.net/maggon/securing-solr-search-data-in-the-cloud
> I think it is proprietary, but I'm not sure. Maybe more googling might help
> find the exact page where his solution is described.
>
> On Mon, Aug 7, 2017 at 9:59 PM, Kumaran Ramasubramanian <
> kums.134@gmail.com>
> wrote:
>
> > Hi Erick, i want to encrypt some fields of an document which has personal
> > identifiable information ( both indexed and stored data)... for eg:
> email,
> > mobilenumber etc.. i am able to find LUCENE-6966 alone while googling
> it..
> > any related pointers in solr or latest lucene version?
> >
> >
> > -
> > ​-
> > Kumaran R​
> >
> > On Mon, Aug 7, 2017 at 9:52 PM, Erick Erickson <er...@gmail.com>
> > wrote:
> >
> > > No, since you haven't defined what you want to encrypt, what your
> > > requirements are, what you hope to get out of "encryption" etc.
> > >
> > > Put the index on an encrypting filesystem and forget about it if you
> > > possibly can, because anything else is a significant amount of work.
> > > To encrypt the searchable tokens on a per-user basis in memory is a
> > > _lot_ of work. It depends on your security needs.
> > >
> > > Otherwise, as I said, please ask specific questions as the topic is
> > > quite large, much too large to conduct a seminar through the user's
> > > list.
> > >
> > > Best,
> > > Erick
> > >
> > > On Mon, Aug 7, 2017 at 9:07 AM, Kumaran Ramasubramanian
> > > <ku...@gmail.com> wrote:
> > > > Hi Erick,
> > > >
> > > >     Thanks for the information. Any pointers about encryption options
> > in
> > > > solr?
> > > >
> > > >
> > > > --
> > > > Kumaran R
> > > >
> > > >
> > > >
> > > > On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <
> > erickerickson@gmail.com>
> > > > wrote:
> > > >
> > > >> Encryption in Solr has a bunch of ramifications. Do you care about
> > > >>
> > > >> - encryption at rest or in memory?
> > > >> - encrypting the _searchable_ tokens?
> > > >> - encrypting the searchable tokens per-user?
> > > >> - encrypting the stored data (which a filter won't do BTW).
> > > >>
> > > >> It's actually a fairly complex topic the discussion at LUCENE-6966
> > > >> outlines much of it. Please ask specific questions as you research
> the
> > > >> topic. One  per-user encryption package that I know of is by Hitachi
> > > >> Solutions (commercial) and it explicitly does _not_ support, for
> > > >> instance, wildcards (there are other limitations too). See:
> > > >> http://www.hitachi-solutions.com/securesearch/
> > > >>
> > > >> Most of the time when people ask for encryption they soon discover
> > > >> it's much more difficult than they imagine and settle for just
> putting
> > > >> the indexes on an encrypting file system. When they move beyond that
> > > >> it gets complex and you'd be well advised to consult with Solr
> > > >> security experts.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
> > > >> <ku...@gmail.com> wrote:
> > > >> > Hi All,
> > > >> >
> > > >> >
> > > >> > After looking at all below discussions, i have one doubt which may
> > be
> > > >> silly
> > > >> > or novice but i want to throw this to lucene user list.
> > > >> >
> > > >> > if we have encryption layer included in our analyzer's flow of
> > filters
> > > >> like
> > > >> > EncryptionFilter to control field-level encryption. what are the
> > > >> > consequences ? am i missing anything basic?
> > > >> >
> > > >> > Thanks in advance..
> > > >> >
> > > >> >
> > > >> > Related links:
> > > >> >
> > > >> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted
> > > >> Directory
> > > >> > - in lucene 3.x
> > > >> >
> > > >> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
> > > >> index-level
> > > >> > encryption - at codec level, to have control on which column /
> field
> > > have
> > > >> >  personal identifiable information
> > > >> >
> > > >> > https://security.stackexchange.com/questions/
> > > 111153/is-a-lucene-search-
> > > >> index-effectively-a-backdoor-for-field-level-encryption
> > > >> >
> > > >> >
> > > >> > A decent encrypting algorithm will not produce, say, the same
> first
> > > >> portion
> > > >> >> for two tokens that start with the same letters. So wildcard
> > searches
> > > >> won't
> > > >> >> work. Consider "runs", "running", "runner". A search on "run*"
> > would
> > > be
> > > >> >> expected to match all three, but wouldn't unless the encryption
> > were
> > > so
> > > >> >> trivial as to be useless. Similar issues arise with sorting.
> "More
> > > Like
> > > >> >> This" would be unreliable. There are many other features of a
> > robust
> > > >> search
> > > >> >> engine that would be impacted, and an index with encrypted terms
> > > would
> > > >> be
> > > >> >> useful for only exact matches, which usually results in a poor
> > search
> > > >> >> experience.
> > > >> >
> > > >> >
> > > >> > https://stackoverflow.com/questions/36604551/adding-
> > > >> encryption-to-solr-lucene-indexes
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Kumaran R
> > > >>
> > > >> ------------------------------------------------------------
> ---------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Re: Encryption at lucene index

Posted by Ishan Chattopadhyaya <ic...@gmail.com>.
Harry Ochiai (Hitachi) has some index encryption solution,
https://www.slideshare.net/maggon/securing-solr-search-data-in-the-cloud
I think it is proprietary, but I'm not sure. Maybe more googling might help
find the exact page where his solution is described.

On Mon, Aug 7, 2017 at 9:59 PM, Kumaran Ramasubramanian <ku...@gmail.com>
wrote:

> Hi Erick, i want to encrypt some fields of an document which has personal
> identifiable information ( both indexed and stored data)... for eg: email,
> mobilenumber etc.. i am able to find LUCENE-6966 alone while googling it..
> any related pointers in solr or latest lucene version?
>
>
> -
> ​-
> Kumaran R​
>
> On Mon, Aug 7, 2017 at 9:52 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > No, since you haven't defined what you want to encrypt, what your
> > requirements are, what you hope to get out of "encryption" etc.
> >
> > Put the index on an encrypting filesystem and forget about it if you
> > possibly can, because anything else is a significant amount of work.
> > To encrypt the searchable tokens on a per-user basis in memory is a
> > _lot_ of work. It depends on your security needs.
> >
> > Otherwise, as I said, please ask specific questions as the topic is
> > quite large, much too large to conduct a seminar through the user's
> > list.
> >
> > Best,
> > Erick
> >
> > On Mon, Aug 7, 2017 at 9:07 AM, Kumaran Ramasubramanian
> > <ku...@gmail.com> wrote:
> > > Hi Erick,
> > >
> > >     Thanks for the information. Any pointers about encryption options
> in
> > > solr?
> > >
> > >
> > > --
> > > Kumaran R
> > >
> > >
> > >
> > > On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <
> erickerickson@gmail.com>
> > > wrote:
> > >
> > >> Encryption in Solr has a bunch of ramifications. Do you care about
> > >>
> > >> - encryption at rest or in memory?
> > >> - encrypting the _searchable_ tokens?
> > >> - encrypting the searchable tokens per-user?
> > >> - encrypting the stored data (which a filter won't do BTW).
> > >>
> > >> It's actually a fairly complex topic the discussion at LUCENE-6966
> > >> outlines much of it. Please ask specific questions as you research the
> > >> topic. One  per-user encryption package that I know of is by Hitachi
> > >> Solutions (commercial) and it explicitly does _not_ support, for
> > >> instance, wildcards (there are other limitations too). See:
> > >> http://www.hitachi-solutions.com/securesearch/
> > >>
> > >> Most of the time when people ask for encryption they soon discover
> > >> it's much more difficult than they imagine and settle for just putting
> > >> the indexes on an encrypting file system. When they move beyond that
> > >> it gets complex and you'd be well advised to consult with Solr
> > >> security experts.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
> > >> <ku...@gmail.com> wrote:
> > >> > Hi All,
> > >> >
> > >> >
> > >> > After looking at all below discussions, i have one doubt which may
> be
> > >> silly
> > >> > or novice but i want to throw this to lucene user list.
> > >> >
> > >> > if we have encryption layer included in our analyzer's flow of
> filters
> > >> like
> > >> > EncryptionFilter to control field-level encryption. what are the
> > >> > consequences ? am i missing anything basic?
> > >> >
> > >> > Thanks in advance..
> > >> >
> > >> >
> > >> > Related links:
> > >> >
> > >> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted
> > >> Directory
> > >> > - in lucene 3.x
> > >> >
> > >> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
> > >> index-level
> > >> > encryption - at codec level, to have control on which column / field
> > have
> > >> >  personal identifiable information
> > >> >
> > >> > https://security.stackexchange.com/questions/
> > 111153/is-a-lucene-search-
> > >> index-effectively-a-backdoor-for-field-level-encryption
> > >> >
> > >> >
> > >> > A decent encrypting algorithm will not produce, say, the same first
> > >> portion
> > >> >> for two tokens that start with the same letters. So wildcard
> searches
> > >> won't
> > >> >> work. Consider "runs", "running", "runner". A search on "run*"
> would
> > be
> > >> >> expected to match all three, but wouldn't unless the encryption
> were
> > so
> > >> >> trivial as to be useless. Similar issues arise with sorting. "More
> > Like
> > >> >> This" would be unreliable. There are many other features of a
> robust
> > >> search
> > >> >> engine that would be impacted, and an index with encrypted terms
> > would
> > >> be
> > >> >> useful for only exact matches, which usually results in a poor
> search
> > >> >> experience.
> > >> >
> > >> >
> > >> > https://stackoverflow.com/questions/36604551/adding-
> > >> encryption-to-solr-lucene-indexes
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Kumaran R
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: Encryption at lucene index

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
Hi Erick, i want to encrypt some fields of an document which has personal
identifiable information ( both indexed and stored data)... for eg: email,
mobilenumber etc.. i am able to find LUCENE-6966 alone while googling it..
any related pointers in solr or latest lucene version?


-
​-
Kumaran R​

On Mon, Aug 7, 2017 at 9:52 PM, Erick Erickson <er...@gmail.com>
wrote:

> No, since you haven't defined what you want to encrypt, what your
> requirements are, what you hope to get out of "encryption" etc.
>
> Put the index on an encrypting filesystem and forget about it if you
> possibly can, because anything else is a significant amount of work.
> To encrypt the searchable tokens on a per-user basis in memory is a
> _lot_ of work. It depends on your security needs.
>
> Otherwise, as I said, please ask specific questions as the topic is
> quite large, much too large to conduct a seminar through the user's
> list.
>
> Best,
> Erick
>
> On Mon, Aug 7, 2017 at 9:07 AM, Kumaran Ramasubramanian
> <ku...@gmail.com> wrote:
> > Hi Erick,
> >
> >     Thanks for the information. Any pointers about encryption options in
> > solr?
> >
> >
> > --
> > Kumaran R
> >
> >
> >
> > On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> Encryption in Solr has a bunch of ramifications. Do you care about
> >>
> >> - encryption at rest or in memory?
> >> - encrypting the _searchable_ tokens?
> >> - encrypting the searchable tokens per-user?
> >> - encrypting the stored data (which a filter won't do BTW).
> >>
> >> It's actually a fairly complex topic the discussion at LUCENE-6966
> >> outlines much of it. Please ask specific questions as you research the
> >> topic. One  per-user encryption package that I know of is by Hitachi
> >> Solutions (commercial) and it explicitly does _not_ support, for
> >> instance, wildcards (there are other limitations too). See:
> >> http://www.hitachi-solutions.com/securesearch/
> >>
> >> Most of the time when people ask for encryption they soon discover
> >> it's much more difficult than they imagine and settle for just putting
> >> the indexes on an encrypting file system. When they move beyond that
> >> it gets complex and you'd be well advised to consult with Solr
> >> security experts.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
> >> <ku...@gmail.com> wrote:
> >> > Hi All,
> >> >
> >> >
> >> > After looking at all below discussions, i have one doubt which may be
> >> silly
> >> > or novice but i want to throw this to lucene user list.
> >> >
> >> > if we have encryption layer included in our analyzer's flow of filters
> >> like
> >> > EncryptionFilter to control field-level encryption. what are the
> >> > consequences ? am i missing anything basic?
> >> >
> >> > Thanks in advance..
> >> >
> >> >
> >> > Related links:
> >> >
> >> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted
> >> Directory
> >> > - in lucene 3.x
> >> >
> >> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
> >> index-level
> >> > encryption - at codec level, to have control on which column / field
> have
> >> >  personal identifiable information
> >> >
> >> > https://security.stackexchange.com/questions/
> 111153/is-a-lucene-search-
> >> index-effectively-a-backdoor-for-field-level-encryption
> >> >
> >> >
> >> > A decent encrypting algorithm will not produce, say, the same first
> >> portion
> >> >> for two tokens that start with the same letters. So wildcard searches
> >> won't
> >> >> work. Consider "runs", "running", "runner". A search on "run*" would
> be
> >> >> expected to match all three, but wouldn't unless the encryption were
> so
> >> >> trivial as to be useless. Similar issues arise with sorting. "More
> Like
> >> >> This" would be unreliable. There are many other features of a robust
> >> search
> >> >> engine that would be impacted, and an index with encrypted terms
> would
> >> be
> >> >> useful for only exact matches, which usually results in a poor search
> >> >> experience.
> >> >
> >> >
> >> > https://stackoverflow.com/questions/36604551/adding-
> >> encryption-to-solr-lucene-indexes
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Kumaran R
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Encryption at lucene index

Posted by Erick Erickson <er...@gmail.com>.
No, since you haven't defined what you want to encrypt, what your
requirements are, what you hope to get out of "encryption" etc.

Put the index on an encrypting filesystem and forget about it if you
possibly can, because anything else is a significant amount of work.
To encrypt the searchable tokens on a per-user basis in memory is a
_lot_ of work. It depends on your security needs.

Otherwise, as I said, please ask specific questions as the topic is
quite large, much too large to conduct a seminar through the user's
list.

Best,
Erick

On Mon, Aug 7, 2017 at 9:07 AM, Kumaran Ramasubramanian
<ku...@gmail.com> wrote:
> Hi Erick,
>
>     Thanks for the information. Any pointers about encryption options in
> solr?
>
>
> --
> Kumaran R
>
>
>
> On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Encryption in Solr has a bunch of ramifications. Do you care about
>>
>> - encryption at rest or in memory?
>> - encrypting the _searchable_ tokens?
>> - encrypting the searchable tokens per-user?
>> - encrypting the stored data (which a filter won't do BTW).
>>
>> It's actually a fairly complex topic the discussion at LUCENE-6966
>> outlines much of it. Please ask specific questions as you research the
>> topic. One  per-user encryption package that I know of is by Hitachi
>> Solutions (commercial) and it explicitly does _not_ support, for
>> instance, wildcards (there are other limitations too). See:
>> http://www.hitachi-solutions.com/securesearch/
>>
>> Most of the time when people ask for encryption they soon discover
>> it's much more difficult than they imagine and settle for just putting
>> the indexes on an encrypting file system. When they move beyond that
>> it gets complex and you'd be well advised to consult with Solr
>> security experts.
>>
>> Best,
>> Erick
>>
>> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
>> <ku...@gmail.com> wrote:
>> > Hi All,
>> >
>> >
>> > After looking at all below discussions, i have one doubt which may be
>> silly
>> > or novice but i want to throw this to lucene user list.
>> >
>> > if we have encryption layer included in our analyzer's flow of filters
>> like
>> > EncryptionFilter to control field-level encryption. what are the
>> > consequences ? am i missing anything basic?
>> >
>> > Thanks in advance..
>> >
>> >
>> > Related links:
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted
>> Directory
>> > - in lucene 3.x
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
>> index-level
>> > encryption - at codec level, to have control on which column / field have
>> >  personal identifiable information
>> >
>> > https://security.stackexchange.com/questions/111153/is-a-lucene-search-
>> index-effectively-a-backdoor-for-field-level-encryption
>> >
>> >
>> > A decent encrypting algorithm will not produce, say, the same first
>> portion
>> >> for two tokens that start with the same letters. So wildcard searches
>> won't
>> >> work. Consider "runs", "running", "runner". A search on "run*" would be
>> >> expected to match all three, but wouldn't unless the encryption were so
>> >> trivial as to be useless. Similar issues arise with sorting. "More Like
>> >> This" would be unreliable. There are many other features of a robust
>> search
>> >> engine that would be impacted, and an index with encrypted terms would
>> be
>> >> useful for only exact matches, which usually results in a poor search
>> >> experience.
>> >
>> >
>> > https://stackoverflow.com/questions/36604551/adding-
>> encryption-to-solr-lucene-indexes
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> > Kumaran R
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Encryption at lucene index

Posted by Kumaran Ramasubramanian <ku...@gmail.com>.
Hi Erick,

    Thanks for the information. Any pointers about encryption options in
solr?


--
Kumaran R



On Mon, Aug 7, 2017 at 9:17 PM, Erick Erickson <er...@gmail.com>
wrote:

> Encryption in Solr has a bunch of ramifications. Do you care about
>
> - encryption at rest or in memory?
> - encrypting the _searchable_ tokens?
> - encrypting the searchable tokens per-user?
> - encrypting the stored data (which a filter won't do BTW).
>
> It's actually a fairly complex topic the discussion at LUCENE-6966
> outlines much of it. Please ask specific questions as you research the
> topic. One  per-user encryption package that I know of is by Hitachi
> Solutions (commercial) and it explicitly does _not_ support, for
> instance, wildcards (there are other limitations too). See:
> http://www.hitachi-solutions.com/securesearch/
>
> Most of the time when people ask for encryption they soon discover
> it's much more difficult than they imagine and settle for just putting
> the indexes on an encrypting file system. When they move beyond that
> it gets complex and you'd be well advised to consult with Solr
> security experts.
>
> Best,
> Erick
>
> On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
> <ku...@gmail.com> wrote:
> > Hi All,
> >
> >
> > After looking at all below discussions, i have one doubt which may be
> silly
> > or novice but i want to throw this to lucene user list.
> >
> > if we have encryption layer included in our analyzer's flow of filters
> like
> > EncryptionFilter to control field-level encryption. what are the
> > consequences ? am i missing anything basic?
> >
> > Thanks in advance..
> >
> >
> > Related links:
> >
> > https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted
> Directory
> > - in lucene 3.x
> >
> > https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for
> index-level
> > encryption - at codec level, to have control on which column / field have
> >  personal identifiable information
> >
> > https://security.stackexchange.com/questions/111153/is-a-lucene-search-
> index-effectively-a-backdoor-for-field-level-encryption
> >
> >
> > A decent encrypting algorithm will not produce, say, the same first
> portion
> >> for two tokens that start with the same letters. So wildcard searches
> won't
> >> work. Consider "runs", "running", "runner". A search on "run*" would be
> >> expected to match all three, but wouldn't unless the encryption were so
> >> trivial as to be useless. Similar issues arise with sorting. "More Like
> >> This" would be unreliable. There are many other features of a robust
> search
> >> engine that would be impacted, and an index with encrypted terms would
> be
> >> useful for only exact matches, which usually results in a poor search
> >> experience.
> >
> >
> > https://stackoverflow.com/questions/36604551/adding-
> encryption-to-solr-lucene-indexes
> >
> >
> >
> >
> >
> >
> > --
> > Kumaran R
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Encryption at lucene index

Posted by Erick Erickson <er...@gmail.com>.
Encryption in Solr has a bunch of ramifications. Do you care about

- encryption at rest or in memory?
- encrypting the _searchable_ tokens?
- encrypting the searchable tokens per-user?
- encrypting the stored data (which a filter won't do BTW).

It's actually a fairly complex topic the discussion at LUCENE-6966
outlines much of it. Please ask specific questions as you research the
topic. One  per-user encryption package that I know of is by Hitachi
Solutions (commercial) and it explicitly does _not_ support, for
instance, wildcards (there are other limitations too). See:
http://www.hitachi-solutions.com/securesearch/

Most of the time when people ask for encryption they soon discover
it's much more difficult than they imagine and settle for just putting
the indexes on an encrypting file system. When they move beyond that
it gets complex and you'd be well advised to consult with Solr
security experts.

Best,
Erick

On Sun, Aug 6, 2017 at 11:30 PM, Kumaran Ramasubramanian
<ku...@gmail.com> wrote:
> Hi All,
>
>
> After looking at all below discussions, i have one doubt which may be silly
> or novice but i want to throw this to lucene user list.
>
> if we have encryption layer included in our analyzer's flow of filters like
> EncryptionFilter to control field-level encryption. what are the
> consequences ? am i missing anything basic?
>
> Thanks in advance..
>
>
> Related links:
>
> https://issues.apache.org/jira/browse/LUCENE-2228 : AES Encrypted Directory
> - in lucene 3.x
>
> https://issues.apache.org/jira/browse/LUCENE-6966 :  Codec for index-level
> encryption - at codec level, to have control on which column / field have
>  personal identifiable information
>
> https://security.stackexchange.com/questions/111153/is-a-lucene-search-index-effectively-a-backdoor-for-field-level-encryption
>
>
> A decent encrypting algorithm will not produce, say, the same first portion
>> for two tokens that start with the same letters. So wildcard searches won't
>> work. Consider "runs", "running", "runner". A search on "run*" would be
>> expected to match all three, but wouldn't unless the encryption were so
>> trivial as to be useless. Similar issues arise with sorting. "More Like
>> This" would be unreliable. There are many other features of a robust search
>> engine that would be impacted, and an index with encrypted terms would be
>> useful for only exact matches, which usually results in a poor search
>> experience.
>
>
> https://stackoverflow.com/questions/36604551/adding-encryption-to-solr-lucene-indexes
>
>
>
>
>
>
> --
> Kumaran R

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org