You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by negrinv <vi...@gmail.com> on 2006/11/29 21:35:45 UTC

Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Attached are proposed modifications to Lucene 2.0 to support
Field.Store.Encrypted.
The rational behind this proposal is simple. Since Lucene can store data in
the index, it effectively makes the data portable. It is conceivable that
some of the data may be sensitive in nature, hence the option to encrypt it.
Both the data and its index are encrypted in this implementation.
This is only an initial implementation. It has the following several
restrictions, all of which can be resolved if required, albeit with some
effort and more changes to Lucene:
1) binary and compressed fields cannot be encrypted as well (a plaintext
once encrypted becomes binary). 
2) Field.Store.Encrypted implies Field.Store.Yes 
This makes sense but it forces one to store the data in the same index where
the tokens are stored. It may be preferable at times to have two indeces,
one for tokens, the other for the data.
3) As implemented, it uses RC4 encryption from BouncyCastle. This is an open
source package, very simple to use which has the advantage of guaranteeing
that the length of the encrypted field is the same as the original
plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its Java
Cryptography Extension, but unfortunately not in Java 1.4.
The BouncyCastle RC4 is not the only algorythm available, others not
depending on third party code can be used, but it was just the simplest to
implement for this first attempt.
4) The attachements are modifications in diff form based on an early (I
think August or September '06) repository snapshot of Lucene 2.0
subsequently updated from the Lucene repository on 29/11/06. They may need
some additional work to merge with the latest version in the Lucene
repository. They also include a couple of JUnit test programs which explain,
as well as test, the usage. You will need the BouncyCastle .jar
(bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the size
of the attachements, but it can be downloaded free from:
 http://www.bouncycastle.org/latest_releases.html
 
5) Searching an encrypted field is restricted to single terms, no phrase or
boolean searches allowed yet, and the term has to be encrypted by the
application before searching it. (ref. attached JUnit test programs)

To the extent that I have tested it, the code works as intended and does not
appear to introduce any regression problems, but more testing by others
would be desirable.
I don't propose at this stage to do any further work with this API
extensions unless there is some expression of interest and direction from
the Lucene Developers team. I have an application ready to roll which uses
the proposed Lucene encryption API additions (please see
http://www.kbforge.com/index.html). The application is not yet available for
downloading simply because I am not sure if the Lucene licence allows me to
do so. I would appreciate your advice in this regard. My application is free
but its source code is not available (yet). I should add that encryption
does not have to be an integral part of Lucene, it can be just part of the
end application, but somehow it seems to me that Field.Store.Encrypted
belongs in the same category as compression and binary values.
I would be happy to receive your feedback.

victor negrin

http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt 
http://www.nabble.com/file/4377/TestEncryptedDocument.java
TestEncryptedDocument.java 
http://www.nabble.com/file/4378/TestDocument.java TestDocument.java 
-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
Luke, I should have mentioned in my earlier posting that what I am proposing
uses password based encrytpion, where the password  is NOT stored anywhere
within Lucene. I avoided  on purpose to make any references to security (as
opposed to encryption) because I believe security to be the responsability
of the end application, not of Lucene. Lucene in my opinion can only provide
encryption services. None of the encryption APIs themselves, wether written
by a third party or by Sun, can guarantee security either. Hence why Lucene
cannot do it also. What it can do is provide the encryption of the data and
its index. Any application using this proposed API extensions will have to
work out the extent to which it can provide security within the context of
all the other APIs involved and  the application requirements themselves. 
I have to agree with you that at some stage Lucene will have to stop
providing new functionality or it will become unmaintenable. But has it
reached that stage yet?
Victor

Luke Nezda wrote:
> 
> Victor-
> Your point is well taken that a comprehensive encryption strategy is not
> quite analogous to compression which is involves more than a
> transformation
> of field values to a more compact form since it requires (at a minimum)
> all
> data structures which comprise the index be encrypted too.  Maybe I spoke
> to
> soon.
> 
> However, after considering this more, I think the scheme would need to be
> quite invasive to provide good security.  I think just plugging in
> encryption simplistically would be very vulnerable to side channel
> attacks.
> It seems the attacker can get clear text terms encrypted via the
> particular
> index's QueryParser implementation and eventually create a fairly complete
> decryption lookup table using Lucene's  data structures, thus undermining
> the security of the internal data structures (encrypted payloads would
> potentially be unaffected (unless they corresponded to index Terms)).
> 
> Let's say this weakness is OK with you.  Using the current API, I think
> you
> can achieve your ends by using encrypting binary field values and adding a
> trailing org.apache.lucene.analysis.TokenFilter you use at index and query
> time that encrypts and Base64 encodes its input (has to be a String). 
> This
> would effectively give you an encrypted form of Lucene's internal data
> structures.
> 
> In addition to my security concerns with the concept, I also still agree
> with the related philosophical issues put forward to this point on the
> related field compression topic.  It seems inevitable to me that if
> encryption support were added, eventually, application developers will try
> to sell Lucene developers on adding features to it in addition to
> supporting
> and maintaining it (ala configurable compression quality factor).  A
> configurable, encrypting Base64 TokenFilter would also be a cool contrib.
> 
> Luke
> 
> On 11/29/06, negrinv <vi...@gmail.com> wrote:
>>
>>
>> Thank you Luke for your comments and the references you supplied. I read
>> through them and reached the following conclusions. There seems to be a
>> philosophical issue about the boundary between a user application and the
>> Lucene API, where should one start and the other stop.
>> The other issue is the significant difference between compression and
>> encryption.
>> As far as the first issue is concerned it is really a matter of personal
>> choice and preference. My feeling is that as long as adding functionality
>> does not impair the performance of the API as a whole, it makes sense to
>> add
>> it to Lucene and thus simplify the task of the application developer.
>> After
>> all, application developers do not have to use all the features of the
>> API
>> and always have the option of subclassing, writing a better version of it
>> if
>> they can, or writing the functionality as part of the application, even
>> if
>>
>> the API provides that functionality already. The API is there to make
>> life
>> easier for those developers who want to use it, nobody "has" to use it.
>> The second issue is more technical. Compression simply compresses the
>> stored
>> data to save storage. The index itself is not compressed therefore
>> searching
>> proceeds as normal. With encryption however you must encrypt the index as
>> well as the stored data otherwise one could reconstruct the source
>> document
>> from the index and thus defeat the purpose of encryption. Correct me if I
>> am
>> wrong, but I think that encrypting the Lucene index is not easy to
>> achieve
>> from outside of Lucene, it implies re-writing as part of the application
>> much code now part of Lucene (see issue number one above), hence my
>> preference for including it as part of the Lucene API rather than as part
>> of
>> the application.
>> Victor
>>
>>
>> Luke Nezda wrote:
>> >
>> > I think that adding encryption support to Lucene fields is a bad idea
>> for
>> > the same reasons adding compression was a bad idea (conclusive comments
>> on
>> > the tail of this  issue
>> > http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
>> fields
>> > can be used by users to achieve this end.  Maybe a contrib with utility
>> > methods would be a compromise to preserve this work and make it
>> accessible
>> > to others, or alternatively just a faq entry with the sample code or
>> > references to it.
>> > Luke
>> >
>> > On 11/29/06, negrinv <victornegrin@gmail.com > wrote:
>> >>
>> >>
>> >> Attached are proposed modifications to Lucene 2.0 to support
>> >> Field.Store.Encrypted.
>> >> The rational behind this proposal is simple. Since Lucene can store
>> data
>> >> in
>> >> the index, it effectively makes the data portable. It is conceivable
>> that
>> >> some of the data may be sensitive in nature, hence the option to
>> encrypt
>> >> it.
>> >> Both the data and its index are encrypted in this implementation.
>> >> This is only an initial implementation. It has the following several
>> >> restrictions, all of which can be resolved if required, albeit with
>> some
>> >> effort and more changes to Lucene:
>> >> 1) binary and compressed fields cannot be encrypted as well (a
>> plaintext
>> >> once encrypted becomes binary).
>> >> 2) Field.Store.Encrypted implies Field.Store.Yes
>> >> This makes sense but it forces one to store the data in the same index
>> >> where
>> >> the tokens are stored. It may be preferable at times to have two
>> indeces,
>> >> one for tokens, the other for the data.
>> >> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is
>> an
>> >> open
>> >> source package, very simple to use which has the advantage of
>> >> guaranteeing
>> >> that the length of the encrypted field is the same as the original
>> >> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
>> >> Java
>> >> Cryptography Extension, but unfortunately not in Java 1.4.
>> >> The BouncyCastle RC4 is not the only algorythm available, others not
>> >> depending on third party code can be used, but it was just the
>> simplest
>> >> to
>> >> implement for this first attempt.
>> >> 4) The attachements are modifications in diff form based on an early
>> (I
>> >> think August or September '06) repository snapshot of Lucene 2.0
>> >> subsequently updated from the Lucene repository on 29/11/06. They may
>> >> need
>> >> some additional work to merge with the latest version in the Lucene
>> >> repository. They also include a couple of JUnit test programs which
>> >> explain,
>> >> as well as test, the usage. You will need the BouncyCastle .jar
>> >> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize
>> the
>> >> size
>> >> of the attachements, but it can be downloaded free from:
>> >> http://www.bouncycastle.org/latest_releases.html
>> >>
>> >> 5) Searching an encrypted field is restricted to single terms, no
>> phrase
>> >> or
>> >> boolean searches allowed yet, and the term has to be encrypted by the
>> >> application before searching it. (ref. attached JUnit test programs)
>> >>
>> >> To the extent that I have tested it, the code works as intended and
>> does
>> >> not
>> >> appear to introduce any regression problems, but more testing by
>> others
>>
>> >> would be desirable.
>> >> I don't propose at this stage to do any further work with this API
>> >> extensions unless there is some expression of interest and direction
>> from
>> >> the Lucene Developers team. I have an application ready to roll which
>> >> uses
>> >> the proposed Lucene encryption API additions (please see
>> >> http://www.kbforge.com/index.html). The application is not yet
>> available
>> >> for
>> >> downloading simply because I am not sure if the Lucene licence allows
>> me
>> >> to
>> >> do so. I would appreciate your advice in this regard. My application
>> is
>> >> free
>> >> but its source code is not available (yet). I should add that
>> encryption
>> >> does not have to be an integral part of Lucene, it can be just part of
>> >> the
>> >> end application, but somehow it seems to me that Field.Store.Encrypted
>> >> belongs in the same category as compression and binary values.
>> >> I would be happy to receive your feedback.
>> >>
>> >> victor negrin
>> >>
>> >> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>> >> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>> >> TestEncryptedDocument.java
>> >> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
>> >> Sent from the Lucene - Java Developer mailing list archive at
>> Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7634221
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Mike Klaas <mi...@gmail.com>.
On 12/1/06, negrinv <vi...@gmail.com> wrote:
>
> I think we should not make too many assumptions about performance until we
> can test alternative solutions.

<>

> The small payload overhead will be amply offset in my opinion by the ability
> to be very selective about what is being encrypted, as opposed to wholesale
> encryption and decryption.

Here I disagree.  There is no point in providing encryption unless the
entire scheme is cryptographically secure.  Such determination
requires thorough knowledge about what types of information exist in
lucene and how it is all related.  If lucene is to provide encryption,
it should be in the form of a scheme in which the whole system is
secure.  Otherwise, what is the point?  Also, if users only want to
encrypt stored fields, that is easier done on client-side.

Selectivity might actually hurt performance, actually, as a system in
which everything is encrypted can work with whole blocks at a time and
have fancy caching schemes in place.  But at that point, it is looking
quite similar to using lucene on an encrypted filesystem.

> Also we should look at performance in the larger
> context of all the possible reasons why users might need encryption. A large
> proportion may not be worried about performance at all.

That may be, but Lucene users are generally quite sensitive to
performance factors.  What makes you think this will not be the case
for consumers of the encryption api?

> And in final
> analysis any performance degradation is not going to be crippling, we are
> probably talking about very small percentages, either way, which, as long as
> they are known and made available, will enable users to make an informed
> decision.

I'm not sure on what you base the performance degradation being on the
order of small percentages (see your point above about making
assumptions),  I certainly don't know for certain, but I can easily
imagine encryption of query-related data (positions, term lists, etc)
having a huge impact on performance.  In any case, there is a
benchmark suite for lucene which can be used to measure the
degradation.

-MIke

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
there is absolutely no suggestion to make any changes to the index format.
the index format would not change, whether you use encryption or not. 

Chuck Williams-2 wrote:
> 
> 
> 
> Mike Klaas wrote on 12/05/2006 11:38 AM:
>> On 12/5/06, negrinv <vi...@gmail.com> wrote:
>>
>>> Chris Hostetter wrote:
>>
>>> > If the code was not already in the core, and someone asked about
>>> adding it
>>> > I would argue against doing so on the grounds that some helpfull
>>> utility
>>> > methods (possibly in a contrib) would be just as usefull, and would
>>> have
>>> > no performance cost for people who don't care about compression.
>>> >
>>> Perhaps, if you look at compression on its own, but once you see
>>> compression
>>> in the context of all the other field options it makes sense to have it
>>> added to Lucene, it's about having everything in one place for ease of
>>> implementation that offsets the performance issue, in my opinion.
>>
>> Note that built-in compression is deprecated, for similar reasons as
>> are being given for the encrypted fields.
> 
> Built-in compression is also memory-hungry and slow due to the copying
> it does.  External compression is much faster, especially if you extend
> Field binary values to support a binary length parameter (which I
> submitted a patch for a long time ago).
> 
> Here is another argument against adding Field encryption to the lucene
> core.  Changes in index format make life complex for any implementations
> that deal with index files directly.  There are a number of Lucene
> sister projects that do this, plus a number of applications.
> 
> I have a fast bulk updater that directly manipulates index files and am
> busy upgrading it right now to the 2.1 index format with lockless
> commits (which is not fully documented in the new index file formats, by
> the way, e.g. the <segment>N.sM separate norm files are missing).  It's
> a pain.  In general, I think changes to Lucene index format should only
> be driven by compelling benefits.  Moving encryption from external to
> internal to get a minor application simplification is not sufficiently
> compelling to me.
> 
> Chuck
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7710253
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Chuck Williams <ch...@manawiz.com>.

Mike Klaas wrote on 12/05/2006 11:38 AM:
> On 12/5/06, negrinv <vi...@gmail.com> wrote:
>
>> Chris Hostetter wrote:
>
>> > If the code was not already in the core, and someone asked about
>> adding it
>> > I would argue against doing so on the grounds that some helpfull
>> utility
>> > methods (possibly in a contrib) would be just as usefull, and would
>> have
>> > no performance cost for people who don't care about compression.
>> >
>> Perhaps, if you look at compression on its own, but once you see
>> compression
>> in the context of all the other field options it makes sense to have it
>> added to Lucene, it's about having everything in one place for ease of
>> implementation that offsets the performance issue, in my opinion.
>
> Note that built-in compression is deprecated, for similar reasons as
> are being given for the encrypted fields.

Built-in compression is also memory-hungry and slow due to the copying
it does.  External compression is much faster, especially if you extend
Field binary values to support a binary length parameter (which I
submitted a patch for a long time ago).

Here is another argument against adding Field encryption to the lucene
core.  Changes in index format make life complex for any implementations
that deal with index files directly.  There are a number of Lucene
sister projects that do this, plus a number of applications.

I have a fast bulk updater that directly manipulates index files and am
busy upgrading it right now to the 2.1 index format with lockless
commits (which is not fully documented in the new index file formats, by
the way, e.g. the <segment>N.sM separate norm files are missing).  It's
a pain.  In general, I think changes to Lucene index format should only
be driven by compelling benefits.  Moving encryption from external to
internal to get a minor application simplification is not sufficiently
compelling to me.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Mike Klaas <mi...@gmail.com>.
On 12/5/06, negrinv <vi...@gmail.com> wrote:

> Chris Hostetter wrote:

> > If the code was not already in the core, and someone asked about adding it
> > I would argue against doing so on the grounds that some helpfull utility
> > methods (possibly in a contrib) would be just as usefull, and would have
> > no performance cost for people who don't care about compression.
> >
> Perhaps, if you look at compression on its own, but once you see compression
> in the context of all the other field options it makes sense to have it
> added to Lucene, it's about having everything in one place for ease of
> implementation that offsets the performance issue, in my opinion.

Note that built-in compression is deprecated, for similar reasons as
are being given for the encrypted fields.

> Finally a point about my code. I was unsuccessful in creating a diff file
> because I was picking up all kind of formatting differences as well. If you
> scan it quickly you will find that is really very simple and, at least in
> its current limited implementation, hardly invasive of Lucene's core. All
> the encryption routines are in a separate class which i placed in the
> utility package.

You can produce diffs selectively if you can't eliminate the
whitespace incoherence:
svn diff path/to/dir1 changed/path2 ...

-MIke

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by robert engels <re...@ix.netcom.com>.
If it is only meant to protect from "prying eyes" a simple field  
level analyzer that does a simple xor/rotation should suffice. It  
will be much faster and simpler.

Going beyond that, your solution is not very secure as has been  
pointed out, so you might as well just uses the simplest solution.


On Dec 5, 2006, at 3:28 PM, negrinv wrote:

>
>
> Chris Hostetter wrote:
>>
>>
>> Compression of stored fields is a feature that the Lucene "core"  
>> currently
>> supports out of the box -- but it does so in a very limited maner  
>> that
>> doesn't allow for much configuration.  There is no advantage for  
>> users in
>> using compressed fields over compressing the data themselves  
>> before adding
>> it to the index, only disdvantages: notably the limited control  
>> the user
>> has over the compression, and added complexity for the code path  
>> executed
>> by all users -- even if they don't use compression (a boolean test on
>> "compressed" in FieldsReader may be fast ... but it's still a  
>> bytecode op
>> for every field that's completley uneccessary for a large portion  
>> of the
>> user base)
>>
>> If the code was not already in the core, and someone asked about  
>> adding it
>> I would argue against doing so on the grounds that some helpfull  
>> utility
>> methods (possibly in a contrib) would be just as usefull, and  
>> would have
>> no performance cost for people who don't care about compression.
>>
> Perhaps, if you look at compression on its own, but once you see  
> compression
> in the context of all the other field options it makes sense to  
> have it
> added to Lucene, it's about having everything in one place for ease of
> implementation that offsets the performance issue, in my opinion.
>
>
>
>> First off, if all we are interested in in Encrypting *stored* data,
>> then the issue becomes exactly the same as compression: there is  
>> no point
>> in putting this functionality in the "core" Lucene code base when  
>> it can
>> be done using helper utility methods -- now that that's out of the  
>> way,
>> let's talk about the good stuff...
>>
>
> As above
>
>
>
>> If we want to encrypt the text portion of Terms that are index for a
>> specific set of fields, this is again something that can easily be  
>> done
>> without modifying the "core" Lucene code base -- utility methods  
>> can be
>> used to help people encrypt UN_TOKENIZED Field values, and a simple
>> AnalyzerWrapper can be made to encrypt the text portion of Tokens  
>> produced
>> by another analyzer both when indexing Field values and when  
>> QueryParser
>> is Analyzing input text if neccessary.
>>
> I take your word for it, but wouldn't you agree that replacing all  
> the above
> with just one line, "Field.Store.Encrypted" (or  
> Field.Store.Encrypt, for
> compatibility with Field.Store.Compress),would be a lot easier to  
> use for
> the average developer?
>
>
>
>> As others have already pointed out: encrypting just the Term text  
>> doesn't
>> do much to aid the overall security of your data -- because a bad  
>> guy with
>> access to your index can use the various statistics about your terms
>> (docFreq, term vectors, term positions, etc...) to aid them in  
>> cracking
>> your encryption -- maybe a user is okay with that risk, in which  
>> case my
>> previous comment about how this can easily be done without  
>> modifying any
>> core lucene classes still holds.  what about users who don't think  
>> this is
>> an acceptible risk? ... a more robust encryption mechanism is
>> neccessary...
>>
> Security is a big topic, we cannot hope to discuss it here. I am  
> talking
> about some form of data protection, not security.
> When you say "a bad guy with access to your index", you imply that  
> nothing
> can be done to protect the index. But accessing an index which you are
> determined to protect would not be easy, would require expertise,  
> money, as
> well as the risk of a potential jail sentence. If you have National  
> Security
> in mind, be assured no agency responsible for national security  
> will use
> open source software which is not certified, and that is downloaded  
> from an
> unsecure site over the internet, in order to protect the nation (I  
> hope!).
>
> If we are talking about applications which need to protect data  
> from curious
> or even ill-intentioned eyes, then you can provide a deterrent by  
> encrypting
> that sensitive data only. It might be a list of names, or balances, or
> credit card numbers. Lucene alone can only provide some form of data
> protection, not security. If you accept this limitation you will  
> find it
> easier to accept the notion of encryption at field level, just like  
> some
> relational database software encrypts at column level. Just as  
> importantly
> you want to be able to search over that encrypted field, somehing  
> which my
> proposed code provides (within the stated current limitation).
>
>
>
>> So exactly what pieces of data about a set of fields in an index  
>> need to
>> be encrypted before you can adequetly say that those fields are  
>> encrypted?
>> Off the top of my head i don't know, but I think the only way to  
>> play it
>> safe is to assume thta *all* of the data needs to be encrypted.
>>
> Cannot agree here, it's application dependent. And keep in mind  
> that once
> you offer new functionality people will find many original  
> applications for
> it.
>
>
>
>> Now the question becomes: do we modify all of the index writitng/ 
>> reading
>> code
>> to add a lot of "if (encrypted) { ... } else { ... }" checks, or  
>> is there
>> an easier way to ensure that all of the data in encrypted without
>> impacting the majority of hte user base?
>>
> A perfectly valid point, only benchmarking will tell by how much  
> the current
> performance of Lucene will be impacted by the addition of encryption.
> Somebody in this discussion suggested a Lucene benchmarking tool  
> which can
> be used. I am not familiar with it, but if it is easy to run then  
> let's do
> it and resolve factually this part of the discussion.
> On a more philosophical level, are you saying that there should not  
> be any
> added functionality to Lucene if it impacts the performance of  
> those who do
> not need the additional functionality. This could be a major  
> limitation to
> the future of Lucene. Perhaps one should set some small % limits to  
> the
> level of impact, but zero could be too limiting.
>
>
>
>> I would argue that creating an EncryptedDirectory class with an  
>> API that
>> looks something like this.......
>> .............
>> .............
>>  - Do my concerns about that impact make sense to you?
>>  - Does my (high level) description of how i think encryption  
>> might make
>>    sense as an optional Lucene feature make sense?
>>  - are there any advantages you see to your approach that you feel  
>> make it
>>    more worthwhile then a Directory based approach?
>>
>
> Points one and two are pefectly valid and make a lot of sense.  
> Point three
> is about what is best for the most, given that there is already an  
> OS option
> to encrypt at directory level.
> I like field encryption because it is functionality which cannot be
> implemented at the OS level, and because of its granularity and its
> similarity to existing Lucene functionality, it would be more  
> intuitive and
> easier to implement at the application level. Encrypting everything  
> in a
> directory would have a performance impact on the application.
> I accept your point about the difference between a file system  
> directory and
> a Lucene directory. But in order to overcome the lack of field-level
> encryption and to minimise the performance impact on the  
> application you
> would be forced to create a separate index and directory for each  
> field
> which you want encrypted.  It will work, but is not a solution I  
> would like
> to have adopt at the application level.
>
> Finally a point about my code. I was unsuccessful in creating a  
> diff file
> because I was picking up all kind of formatting differences as  
> well. If you
> scan it quickly you will find that is really very simple and, at  
> least in
> its current limited implementation, hardly invasive of Lucene's  
> core. All
> the encryption routines are in a separate class which i placed in the
> utility package.
>
> Victor
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
>
>
> -- 
> View this message in context: http://www.nabble.com/Attached- 
> proposed-modifications-to-Lucene-2.0-to-support- 
> Field.Store.Encrypted-tf2727614.html#a7708481
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.

Chris Hostetter wrote:
> 
> 
> Compression of stored fields is a feature that the Lucene "core" currently
> supports out of the box -- but it does so in a very limited maner that
> doesn't allow for much configuration.  There is no advantage for users in
> using compressed fields over compressing the data themselves before adding
> it to the index, only disdvantages: notably the limited control the user
> has over the compression, and added complexity for the code path executed
> by all users -- even if they don't use compression (a boolean test on
> "compressed" in FieldsReader may be fast ... but it's still a bytecode op
> for every field that's completley uneccessary for a large portion of the
> user base)
> 
> If the code was not already in the core, and someone asked about adding it
> I would argue against doing so on the grounds that some helpfull utility
> methods (possibly in a contrib) would be just as usefull, and would have
> no performance cost for people who don't care about compression.
> 
Perhaps, if you look at compression on its own, but once you see compression
in the context of all the other field options it makes sense to have it
added to Lucene, it's about having everything in one place for ease of
implementation that offsets the performance issue, in my opinion.



> First off, if all we are interested in in Encrypting *stored* data,
> then the issue becomes exactly the same as compression: there is no point
> in putting this functionality in the "core" Lucene code base when it can
> be done using helper utility methods -- now that that's out of the way,
> let's talk about the good stuff...
> 

As above



> If we want to encrypt the text portion of Terms that are index for a
> specific set of fields, this is again something that can easily be done
> without modifying the "core" Lucene code base -- utility methods can be
> used to help people encrypt UN_TOKENIZED Field values, and a simple
> AnalyzerWrapper can be made to encrypt the text portion of Tokens produced
> by another analyzer both when indexing Field values and when QueryParser
> is Analyzing input text if neccessary.
> 
I take your word for it, but wouldn't you agree that replacing all the above
with just one line, "Field.Store.Encrypted" (or Field.Store.Encrypt, for
compatibility with Field.Store.Compress),would be a lot easier to use for
the average developer?



> As others have already pointed out: encrypting just the Term text doesn't
> do much to aid the overall security of your data -- because a bad guy with
> access to your index can use the various statistics about your terms
> (docFreq, term vectors, term positions, etc...) to aid them in cracking
> your encryption -- maybe a user is okay with that risk, in which case my
> previous comment about how this can easily be done without modifying any
> core lucene classes still holds.  what about users who don't think this is
> an acceptible risk? ... a more robust encryption mechanism is
> neccessary...
> 
Security is a big topic, we cannot hope to discuss it here. I am talking
about some form of data protection, not security.
When you say "a bad guy with access to your index", you imply that nothing
can be done to protect the index. But accessing an index which you are
determined to protect would not be easy, would require expertise, money, as
well as the risk of a potential jail sentence. If you have National Security
in mind, be assured no agency responsible for national security will use
open source software which is not certified, and that is downloaded from an
unsecure site over the internet, in order to protect the nation (I hope!).

If we are talking about applications which need to protect data from curious
or even ill-intentioned eyes, then you can provide a deterrent by encrypting
that sensitive data only. It might be a list of names, or balances, or
credit card numbers. Lucene alone can only provide some form of data
protection, not security. If you accept this limitation you will find it
easier to accept the notion of encryption at field level, just like some
relational database software encrypts at column level. Just as importantly
you want to be able to search over that encrypted field, somehing which my
proposed code provides (within the stated current limitation).



> So exactly what pieces of data about a set of fields in an index need to
> be encrypted before you can adequetly say that those fields are encrypted?
> Off the top of my head i don't know, but I think the only way to play it
> safe is to assume thta *all* of the data needs to be encrypted. 
> 
Cannot agree here, it's application dependent. And keep in mind that once
you offer new functionality people will find many original applications for
it. 



> Now the question becomes: do we modify all of the index writitng/reading
> code
> to add a lot of "if (encrypted) { ... } else { ... }" checks, or is there
> an easier way to ensure that all of the data in encrypted without
> impacting the majority of hte user base?
> 
A perfectly valid point, only benchmarking will tell by how much the current
performance of Lucene will be impacted by the addition of encryption.
Somebody in this discussion suggested a Lucene benchmarking tool which can
be used. I am not familiar with it, but if it is easy to run then let's do
it and resolve factually this part of the discussion.
On a more philosophical level, are you saying that there should not be any
added functionality to Lucene if it impacts the performance of those who do
not need the additional functionality. This could be a major limitation to
the future of Lucene. Perhaps one should set some small % limits to the
level of impact, but zero could be too limiting.



> I would argue that creating an EncryptedDirectory class with an API that
> looks something like this.......
> .............
> .............
>  - Do my concerns about that impact make sense to you?
>  - Does my (high level) description of how i think encryption might make
>    sense as an optional Lucene feature make sense?
>  - are there any advantages you see to your approach that you feel make it
>    more worthwhile then a Directory based approach?
> 

Points one and two are pefectly valid and make a lot of sense. Point three
is about what is best for the most, given that there is already an OS option
to encrypt at directory level. 
I like field encryption because it is functionality which cannot be
implemented at the OS level, and because of its granularity and its
similarity to existing Lucene functionality, it would be more intuitive and
easier to implement at the application level. Encrypting everything in a
directory would have a performance impact on the application.
I accept your point about the difference between a file system directory and
a Lucene directory. But in order to overcome the lack of field-level
encryption and to minimise the performance impact on the application you
would be forced to create a separate index and directory for each field
which you want encrypted.  It will work, but is not a solution I would like
to have adopt at the application level.

Finally a point about my code. I was unsuccessful in creating a diff file
because I was picking up all kind of formatting differences as well. If you
scan it quickly you will find that is really very simple and, at least in
its current limited implementation, hardly invasive of Lucene's core. All
the encryption routines are in a separate class which i placed in the
utility package. 

Victor
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org




-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7708481
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Chris Hostetter <ho...@fucit.org>.
(For the record: I have delibierately avoided looking at your patch so
far, because i didn't want my opinion on the question of "should Lucene
offer encryption services" to be clouded by any specifics of your
implimentation.  That said...)

As it's already been pointed out, an apples to apples comparison can not
be made between supporting encryption and supporting compression, but lets
talk about compression a little anyway.

Compression of stored fields is a feature that the Lucene "core" currently
supports out of the box -- but it does so in a very limited maner that
doesn't allow for much configuration.  There is no advantage for users in
using compressed fields over compressing the data themselves before adding
it to the index, only disdvantages: notably the limited control the user
has over the compression, and added complexity for the code path executed
by all users -- even if they don't use compression (a boolean test on
"compressed" in FieldsReader may be fast ... but it's still a bytecode op
for every field that's completley uneccessary for a large portion of the
user base)

If the code was not already in the core, and someone asked about adding it
I would argue against doing so on the grounds that some helpfull utility
methods (possibly in a contrib) would be just as usefull, and would have
no performance cost for people who don't care about compression.

Now let's talk about encryption again:

First off, if all we are interested in in Encrypting *stored* data,
then the issue becomes exactly the same as compression: there is no point
in putting this functionality in the "core" Lucene code base when it can
be done using helper utility methods -- now that that's out of the way,
let's talk about the good stuff...

If we want to encrypt the text portion of Terms that are index for a
specific set of fields, this is again something that can easily be done
without modifying the "core" Lucene code base -- utility methods can be
used to help people encrypt UN_TOKENIZED Field values, and a simple
AnalyzerWrapper can be made to encrypt the text portion of Tokens produced
by another analyzer both when indexing Field values and when QueryParser
is Analyzing input text if neccessary.

As others have already pointed out: encrypting just the Term text doesn't
do much to aid the overall security of your data -- because a bad guy with
access to your index can use the various statistics about your terms
(docFreq, term vectors, term positions, etc...) to aid them in cracking
your encryption -- maybe a user is okay with that risk, in which case my
previous comment about how this can easily be done without modifying any
core lucene classes still holds.  what about users who don't think this is
an acceptible risk? ... a more robust encryption mechanism is
neccessary...

So exactly what pieces of data about a set of fields in an index need to
be encrypted before you can adequetly say that those fields are encrypted?
Off the top of my head i don't know, but I think the only way to play it
safe is to assume thta *all* of the data needs to be encrypted.  Now the
question becomes: do we modify all of the index writitng/reading code
to add a lot of "if (encrypted) { ... } else { ... }" checks, or is there
an easier way to ensure that all of the data in encrypted without
impacting the majority of hte user base?

I would argue that creating an EncryptedDirectory class with an API that
looks something like this...

  public class EncryptedDirectory extends Directory {
    public Directory(Directory wraped, EncryptionProvider provider);
    // all Directory methods here
  }

...might be the best way to go, as it:
  1) achieves the result (provide encryption)
  2) doesn't affect performance of clients who don't care abotu the feature
  3) doesn't limit the functionality of users who do use the feature (the
physical index can still be stored in a database, or stored on disk, or
stored purely in RAM.

If users who want to use encryption really care deeply about only
having *some* of their fields encrypted, and don't want to pay the
performance costs of encryption for their other fields, they can use
a ParallelReader spanning two indexes: one using and EncryptedDirectory
wrapped arround the sensitive ifelds and one using a regula directory
containing the unsafe fields.

: 1) is it a good idea to have ancryption added to Lucene? I think so

: 2) assuming the answer to 1) above is yes, how should one go about including
: encryption in Lucene. My solution is just that, one approach. Others have

I would say that my answer to #1 is "maybe" and my answer to #2 is "in
some way that has no impact at all on people who don't want to use it.
that said, I'm assuming since this thread subject mentions
Field.Store.Encrypted that your approach is a fairly "low level" change
that would impact non-users (slightly, but impact non the less)

 - Do my concerns about that impact make sense to you?
 - Does my (high level) description of how i think encryption might make
   sense as an optional Lucene feature make sense?
 - are there any advantages you see to your approach that you feel make it
   more worthwhile then a Directory based approach?

: encryption in Lucene. My solution is just that, one approach. Others have
: proposed directory or file system encryption. My view on this is that this
: level of encryption is already provided by all major operating systems, as
: well a by some hardware devices. I would not see a justifiable benefit in
: adding it to Lucene. But that is only my personal opinion, although I am

There is a big differnece however in a "file system directory" and an
"org.apache.lucene.store.Directory" -- i agree with you that just adding
the ability to encrypt an FSDirectory would have little advantages over
using a more OS based approach, but it might make a lot of sense to do it
at the Lucene Directory level -- so users can leverage it no matter where
they store their index.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Doug Cutting <cu...@apache.org>.
negrinv wrote:
> there is a third way Doug,  and it's for me to stop trying to be polite by
> answering all the questions that I am being asked, then nobody will get
> upset by my replies. If the decision is for no encryption at field level, I
> accept it, but I don't believe it should be externalised. Perhaps someone
> else will pick up your offer.

I don't think anyone is upset.  I was trying to provide an alternative 
to giving up and walking away, but, yes, that's an option too.  However, 
if you think field-level encryption is something that lots of folks 
would use, and want to help provide it, it might not be your most 
satisfactory option.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Michael Busch <bu...@gmail.com>.
negrinv wrote:
> there is a third way Doug,  and it's for me to stop trying to be polite by
> answering all the questions that I am being asked, then nobody will get
> upset by my replies. If the decision is for no encryption at field level, I
> accept it, but I don't believe it should be externalised. Perhaps someone
> else will pick up your offer.
>
> V.
>   
Victor,

nobody is upset here (I hope you're not either :-) ). I think all Doug 
wanted to tell you is that you are quite tenacious about your point of 
putting encryption into the core of Lucene. The fact that you got a lot 
of responses to your mail shows, that the developers are not neglecting 
this topic but are rather trying to find a solution better suitable for 
all Lucene users.

This is just how open source works. You can make suggestions, but you 
have to listen to the community consisting of developers and users and 
what they think about it. In the Lucene developer team are a lot of very 
bright people and in almost all cases patches/new features benefit at 
the end from interesting discussions and different opinions.

So if you accept how open source works and if you're up for making 
changes to your patch so that the community will like it, everybody will 
benefit: you, because you learn from experienced people, your patch 
because it gets a better quality and the users, because they get some 
kind of encryption support in Lucene.

Regards,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
there is a third way Doug,  and it's for me to stop trying to be polite by
answering all the questions that I am being asked, then nobody will get
upset by my replies. If the decision is for no encryption at field level, I
accept it, but I don't believe it should be externalised. Perhaps someone
else will pick up your offer.

V.


Doug Cutting wrote:
> 
>> Doug Cutting wrote:
>>> So, Victor, do you think this functionality could be reasonably packaged 
>>> as an add-on package to Lucene?
>>
>> Doug, for an answer to most of your questions could you please refer to
>> my
>> answer to Chris Hostetter  [ ... ]
> 
> Let me be more direct.  Encryption of Lucene fields may be useful. 
> However Lucene's developers don't appear to feel that it's appropriate 
> to include it in Lucene's the core at this time, and it doesn't seem 
> you're making much progress convincing them.
> 
> There are two ways you could go from here: you could continue to argue 
> whether encryption should be added to the core, or you could try to 
> incorporate the feedback from the Lucene team, and perhaps try to get 
> encryption added to Lucene as an add-on package.  I think the latter 
> approach is much more likely to be successful and will be less 
> frustrating for all concerned.
> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7710442
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Doug Cutting <cu...@apache.org>.
> Doug Cutting wrote:
>> So, Victor, do you think this functionality could be reasonably packaged 
>> as an add-on package to Lucene?
>
> Doug, for an answer to most of your questions could you please refer to my
> answer to Chris Hostetter  [ ... ]

Let me be more direct.  Encryption of Lucene fields may be useful. 
However Lucene's developers don't appear to feel that it's appropriate 
to include it in Lucene's the core at this time, and it doesn't seem 
you're making much progress convincing them.

There are two ways you could go from here: you could continue to argue 
whether encryption should be added to the core, or you could try to 
incorporate the feedback from the Lucene team, and perhaps try to get 
encryption added to Lucene as an add-on package.  I think the latter 
approach is much more likely to be successful and will be less 
frustrating for all concerned.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.

Doug Cutting wrote:
> 
> 
> Some utilities for encrypting and decrypting binary fields might make a 
> useful contrib module, but I see no compelling reason to add this to the 
> core at this point.  If such a contrib module becomes widely used, and 
> it becomes clear that it would work better if more tightly integrated, 
> then perhaps we could move it into the core.
> 
> So, Victor, do you think this functionality could be reasonably packaged 
> as an add-on package to Lucene?  If so, it would be much easier to get 
> it included in Lucene as a contrib module.
> 
> Doug
> 
Doug, for an answer to most of your questions could you please refer to my
answer to Chris Hostetter, Chris makes some valid points:

http://www.nabble.com/Re%3A-Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-p7708481.html

Victor

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org




-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7708710
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Doug Cutting <cu...@apache.org>.
robert engels wrote:
> I would counter that it is not a compelling feature for MOST users of 
> Lucene, but it can still be implemented externally using binary fields 
> for those that require it, and or even easier (and maybe even faster) 
> using a encrypted filesystem with proper security.

+1

Some utilities for encrypting and decrypting binary fields might make a 
useful contrib module, but I see no compelling reason to add this to the 
core at this point.  If such a contrib module becomes widely used, and 
it becomes clear that it would work better if more tightly integrated, 
then perhaps we could move it into the core.

So, Victor, do you think this functionality could be reasonably packaged 
as an add-on package to Lucene?  If so, it would be much easier to get 
it included in Lucene as a contrib module.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by robert engels <re...@ix.netcom.com>.
I think the point of the discussion is really to determine the answer  
to #1.

I would counter that it is not a compelling feature for MOST users of  
Lucene, but it can still be implemented externally using binary  
fields for those that require it, and or even easier (and maybe even  
faster) using a encrypted filesystem with proper security.

Adding it to the core Lucene complicates the code base, and I do not  
believe it is warranted.

This is only my opinion.

On Dec 2, 2006, at 2:38 PM, negrinv wrote:

>
> At the contrary Mike, I am beginning to think that there have been  
> a number
> of misunderstandings, of my original posting to start with.
> When I submitted my proposal I was prepared for some discussion on the
> merits  or otherwise of my proposed solution. I had no idea that the
> discussion would drift towards security and performance in absolute  
> terms. I
> would like now to steer the debate in its intended direction.
>
> I have no difficulty agreeing with you on both counts. A non- 
> encrypted swap
> file is a security risk, and encryption imposes a performance  
> penalty. Both
> of which I submit are not relevant to my posting for the following  
> reasons.
> Security is all about knowing where you stand so you can take
> counter-measures, it is not about a "false sense of security"  
> provided by
> knowing you have an encrypted swap file or a 3000 byte encryption key.
> Lucene cannot provide security. It would be a legal nightmare and  
> an absurd
> expectation. The underlying operating system within which Lucene  
> runs does
> not guarantee security, the encryption software provider does not  
> guarantee
> security, password protection and physical security are also  
> outside of
> Lucene's control. What Lucene can do is to provide encryption  
> services,
> while the application has to provide a given level of security. For
> instance, if you run under an operating system which does not  
> provide swap
> file encryption, then you must disable the swap file. Does that  
> impose a
> performance penalty? Probably, if your memory is limited, but now  
> you know
> where you stand so you make a decision. Performance or encrytpion  
> or more
> memory. But one cannot, in my view, shift the responsability for that
> decision to Lucene.
> I'll give you another example, you mentioned padding of 128 bits.  
> True,
> there are encryption routines which impose that penalty. For my  
> (initial)
> implementation I had the choice between an algorythm with padding,  
> or RC4,
> which does not pad. A 10 character term remains a 10 character term  
> after
> encryption. No padding and no index size implications. I said so in my
> posting and as an application developer you then have a choice to  
> make. Use
> Lucene RC4 encryption as proposed (for the time being) or use another
> product, or write your own. Without knowing the application, any  
> decision
> would be totally out of context, and no one piece of software can  
> satisfy
> all applications. A possible solution would be for Lucene to offer  
> a choice
> of algorythms.
>
> The army I am sure would like to run its tanks at the speed of a  
> Ferrary,
> but it cannot, it hits a wall known as cost-benefit ratio. It must  
> choose
> between security and speed and budget, keeping in mind the  
> application. The
> modern tank is the answer. A compromise.
> My original posting avoided the notion of security and performance in
> absolute terms precisely because of all the above considerations,  
> it simply
> addressed a couple of points which need to be resolved before the  
> specifics
> of the implementation can be discussed.
>
> 1) is it a good idea to have ancryption added to Lucene? I think so
> obviously, but not everyone agrees. As was pointed out in this  
> discussion,
> some relational database software provides encryption at the column  
> level, a
> functionality equivalent to the one I proposed. Lucene in some ways  
> competes
> with relational databases.
>
> 2) assuming the answer to 1) above is yes, how should one go about  
> including
> encryption in Lucene. My solution is just that, one approach.  
> Others have
> proposed directory or file system encryption. My view on this is  
> that this
> level of encryption is already provided by all major operating  
> systems, as
> well a by some hardware devices. I would not see a justifiable  
> benefit in
> adding it to Lucene. But that is only my personal opinion, although  
> I am
> aware that directory encryption is in the hands of the system  
> administrator,
> not the application end user. Perhaps there are other options which  
> have not
> been raised yet.
>
> 3) assuming my proposal is acceptable, can it be implemented  
> better. I am
> not a Lucene expert, I learned Lucene on the go. I would be  
> delighted to see
> a better solution presented, it would be a learning experience for me.
>
> I hope I have not added to the confusion.
>
> Season's greetings to you and to all who took time to participate  
> in this
> discussion.
> Victor
>
> Robert Engels wrote:
>>
>> I think you misunderstood. If you do not have encrypted swap (like
>> OSX provides for) then you encryption is pointless as anyone can
>> inspect the data as it it loaded into the heap by lucene - bypassing
>> the encryption.
>>
>> I also think you underestimated the impact on the size of the
>> indexes, as most secure encryption schemes are going to pad the
>> payloads to a minimum of 128 bits, and usually much more.
>>
>> This is going to make a HUGE difference in the size of the index.
>>
>> On Dec 1, 2006, at 2:00 PM, negrinv wrote:
>>
>>>
>>> Good news for OSX users! but what about all the others, should I
>>> say the
>>> majority??
>>> One more reason for encrypting at field level.
>>> Victor
>>>
>>>
>>> Robert Engels wrote:
>>>>
>>>> Not if running under OSX with encrypted swap turned on ! :)
>>>>
>>>> -----Original Message-----
>>>>> From: Nicolas Lalev�e <ni...@anyware-tech.com>
>>>>> Sent: Dec 1, 2006 4:49 AM
>>>>> To: java-dev@lucene.apache.org
>>>>> Subject: Re: Attached proposed modifications to Lucene 2.0 to
>>>>> support
>>> Field.Store.Encrypted
>>>>>
>>>>> Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
>>>>>> Nicolas Lalev�e-2 wrote:
>>>>>>> Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
>>>>>>>> Thank you Robert for your commnets. I am inclined to agree
>>>>>>>> with you,
>>>>>> but
>>>>>>>> I
>>>>>>>> would like to establish first of all if simplicity of
>>>>>>>> implementation
>>>>>> is
>>>>>>>> the
>>>>>>>> overriding consideration. But before I dwell on that let me
>>>>>>>> say that
>>>>>> i
>>>>>>>> have
>>>>>>>> discovered that I am not a master of DIFF file creation with
>>>>>>>> Eclipse.
>>>>>>>> The diff file attachement to my original posting is absurdly
>>>>>>>> large
>>>>>> and
>>>>>>>> not correct. I have therefore attached a zip file containing  
>>>>>>>> the
>>>>>>>> complete source code of the classes I modified. I leave it to
>>>>>>>> others
>>>>>> to
>>>>>>>> extract the
>>>>>>>> diffs properly.
>>>>>>>> Back to the issue. So far the implementation has not been
>>>>>>>> difficult
>>>>>>>> considering that I knew nothing about Lucene internals before I
>>>>>> started.
>>>>>>>> The reason is that Lucene is very well structured and the  
>>>>>>>> changes
>>>>>> just
>>>>>>>> fitted nicely by adding some code in the right place with  
>>>>>>>> minimal
>>>>>>>> changes to the existing code. But I admit that the proposed
>>>>>>>> implementation so far is not complete and more work is
>>>>>>>> required to
>>>>>>>> overcome some of its restrictions. While I like your idea I
>>>>>>>> believe
>>>>>> that
>>>>>>>> it imposed too large a
>>>>>>>> granularity on the encrypted data, all fields will all kinds
>>>>>>>> of data
>>>>>>>> will be encrypted including  images and others which normally
>>>>>>>> would
>>>>>> be
>>>>>>>> left alone, thus adding to the performance penalty due to
>>>>>>>> encryption.
>>>>>>>
>>>>>>> I don't agree with you here. In Lucene, you will encrypt the  
>>>>>>> field
>>>>>> data,
>>>>>>> the
>>>>>>> field names, and the tokens : I would say that is represents at
>>>>>>> least
>>>>>> 2/3
>>>>>>> of
>>>>>>> the index size. Then, with the implementation you suggest, I  
>>>>>>> think
>>>>>> (sorry
>>>>>>> I
>>>>>>> didn't took time to see you patch) that every time a lucene
>>>>>>> data need
>>>>>> to
>>>>>>> be
>>>>>>> read, it is decrypted each time. With an encrypted FS, your  
>>>>>>> kernel
>>>>>> will
>>>>>>> maintain a cache in RAM for you, so it won't hurt so much.
>>>>>>> It needs some bench to see what is effectively the best, but I
>>>>>>> have
>>>>>> doubt
>>>>>>> that
>>>>>>> your solution will be faster.
>>>>>>>
>>>>>>> Nicolas.
>>>>>>
>>>>>> Nicolas, I am all in favour of some tests to establish which
>>>>>> solution is
>>>>>> best, but I have to say that I don't believe file system or
>>>>>> directory
>>>>>> encryption in Lucene is really justified. Most operating system
>>>>>> already
>>>>>> provide this feature, although they are system-wide or policy- 
>>>>>> based
>>>>>> solution, hence not always within individual user control.
>>>>>> But if the issue is user control, then I believe Lucene should
>>>>>> provide
>>>>>> maximum granularity when it comes to choice of data to encrypt.
>>>>>> The issue I believe is whether some form of encryption should be
>>>>>> provided
>>>>>> within Lucene to enable application developers to create
>>>>>> applications
>>>>>> which
>>>>>> offer some data protection under user control, with a minimum of
>>>>>> impact,
>>>>>> where by impact I mean both on peformance and workload either in
>>>>>> Lucene
>>>>>> code or user code.
>>>>>
>>>>> In fact you mean a user that has no control of it's machine, and
>>>>> that
>>> cannot
>>>>> encrypt his partition. Here you will have the issue with the
>>>>> swap : Lucene
>>>>> will decrypt the data in RAM, that can possibly pushed on the
>>>>> swap... I
>>> know
>>>>> this is extreme, but it's a security hole.
>>>>>
>>>>> -- 
>>>>> Nicolas LALEV�E
>>>>> Solutions & Technologies
>>>>> ANYWARE TECHNOLOGIES
>>>>> Tel : +33 (0)5 61 00 52 90
>>>>> Fax : +33 (0)5 61 00 51 46
>>>>> http://www.anyware-tech.com
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/Attached-
>>> proposed-modifications-to-Lucene-2.0-to-support-
>>> Field.Store.Encrypted-tf2727614.html#a7645198
>>> Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Attached- 
> proposed-modifications-to-Lucene-2.0-to-support- 
> Field.Store.Encrypted-tf2727614.html#a7657011
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
At the contrary Mike, I am beginning to think that there have been a number
of misunderstandings, of my original posting to start with.
When I submitted my proposal I was prepared for some discussion on the
merits  or otherwise of my proposed solution. I had no idea that the
discussion would drift towards security and performance in absolute terms. I
would like now to steer the debate in its intended direction.

I have no difficulty agreeing with you on both counts. A non-encrypted swap
file is a security risk, and encryption imposes a performance penalty. Both
of which I submit are not relevant to my posting for the following reasons.
Security is all about knowing where you stand so you can take
counter-measures, it is not about a "false sense of security" provided by
knowing you have an encrypted swap file or a 3000 byte encryption key.
Lucene cannot provide security. It would be a legal nightmare and an absurd
expectation. The underlying operating system within which Lucene runs does
not guarantee security, the encryption software provider does not guarantee
security, password protection and physical security are also outside of
Lucene's control. What Lucene can do is to provide encryption services,
while the application has to provide a given level of security. For
instance, if you run under an operating system which does not provide swap
file encryption, then you must disable the swap file. Does that impose a
performance penalty? Probably, if your memory is limited, but now you know
where you stand so you make a decision. Performance or encrytpion or more
memory. But one cannot, in my view, shift the responsability for that
decision to Lucene.
I'll give you another example, you mentioned padding of 128 bits. True,
there are encryption routines which impose that penalty. For my (initial)
implementation I had the choice between an algorythm with padding, or RC4,
which does not pad. A 10 character term remains a 10 character term after
encryption. No padding and no index size implications. I said so in my
posting and as an application developer you then have a choice to make. Use
Lucene RC4 encryption as proposed (for the time being) or use another
product, or write your own. Without knowing the application, any decision
would be totally out of context, and no one piece of software can satisfy
all applications. A possible solution would be for Lucene to offer a choice
of algorythms.

The army I am sure would like to run its tanks at the speed of a Ferrary,
but it cannot, it hits a wall known as cost-benefit ratio. It must choose
between security and speed and budget, keeping in mind the application. The
modern tank is the answer. A compromise.
My original posting avoided the notion of security and performance in
absolute terms precisely because of all the above considerations, it simply
addressed a couple of points which need to be resolved before the specifics
of the implementation can be discussed.

1) is it a good idea to have ancryption added to Lucene? I think so
obviously, but not everyone agrees. As was pointed out in this discussion,
some relational database software provides encryption at the column level, a
functionality equivalent to the one I proposed. Lucene in some ways competes
with relational databases.

2) assuming the answer to 1) above is yes, how should one go about including
encryption in Lucene. My solution is just that, one approach. Others have
proposed directory or file system encryption. My view on this is that this
level of encryption is already provided by all major operating systems, as
well a by some hardware devices. I would not see a justifiable benefit in
adding it to Lucene. But that is only my personal opinion, although I am
aware that directory encryption is in the hands of the system administrator,
not the application end user. Perhaps there are other options which have not
been raised yet.

3) assuming my proposal is acceptable, can it be implemented better. I am
not a Lucene expert, I learned Lucene on the go. I would be delighted to see
a better solution presented, it would be a learning experience for me.

I hope I have not added to the confusion.

Season's greetings to you and to all who took time to participate in this
discussion.
Victor

Robert Engels wrote:
> 
> I think you misunderstood. If you do not have encrypted swap (like  
> OSX provides for) then you encryption is pointless as anyone can  
> inspect the data as it it loaded into the heap by lucene - bypassing  
> the encryption.
> 
> I also think you underestimated the impact on the size of the  
> indexes, as most secure encryption schemes are going to pad the  
> payloads to a minimum of 128 bits, and usually much more.
> 
> This is going to make a HUGE difference in the size of the index.
> 
> On Dec 1, 2006, at 2:00 PM, negrinv wrote:
> 
>>
>> Good news for OSX users! but what about all the others, should I  
>> say the
>> majority??
>> One more reason for encrypting at field level.
>> Victor
>>
>>
>> Robert Engels wrote:
>>>
>>> Not if running under OSX with encrypted swap turned on ! :)
>>>
>>> -----Original Message-----
>>>> From: Nicolas Lalev�e <ni...@anyware-tech.com>
>>>> Sent: Dec 1, 2006 4:49 AM
>>>> To: java-dev@lucene.apache.org
>>>> Subject: Re: Attached proposed modifications to Lucene 2.0 to  
>>>> support
>> Field.Store.Encrypted
>>>>
>>>> Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
>>>>> Nicolas Lalev�e-2 wrote:
>>>>>> Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
>>>>>>> Thank you Robert for your commnets. I am inclined to agree  
>>>>>>> with you,
>>>>> but
>>>>>>> I
>>>>>>> would like to establish first of all if simplicity of  
>>>>>>> implementation
>>>>> is
>>>>>>> the
>>>>>>> overriding consideration. But before I dwell on that let me  
>>>>>>> say that
>>>>> i
>>>>>>> have
>>>>>>> discovered that I am not a master of DIFF file creation with  
>>>>>>> Eclipse.
>>>>>>> The diff file attachement to my original posting is absurdly  
>>>>>>> large
>>>>> and
>>>>>>> not correct. I have therefore attached a zip file containing the
>>>>>>> complete source code of the classes I modified. I leave it to  
>>>>>>> others
>>>>> to
>>>>>>> extract the
>>>>>>> diffs properly.
>>>>>>> Back to the issue. So far the implementation has not been  
>>>>>>> difficult
>>>>>>> considering that I knew nothing about Lucene internals before I
>>>>> started.
>>>>>>> The reason is that Lucene is very well structured and the changes
>>>>> just
>>>>>>> fitted nicely by adding some code in the right place with minimal
>>>>>>> changes to the existing code. But I admit that the proposed
>>>>>>> implementation so far is not complete and more work is  
>>>>>>> required to
>>>>>>> overcome some of its restrictions. While I like your idea I  
>>>>>>> believe
>>>>> that
>>>>>>> it imposed too large a
>>>>>>> granularity on the encrypted data, all fields will all kinds  
>>>>>>> of data
>>>>>>> will be encrypted including  images and others which normally  
>>>>>>> would
>>>>> be
>>>>>>> left alone, thus adding to the performance penalty due to  
>>>>>>> encryption.
>>>>>>
>>>>>> I don't agree with you here. In Lucene, you will encrypt the field
>>>>> data,
>>>>>> the
>>>>>> field names, and the tokens : I would say that is represents at  
>>>>>> least
>>>>> 2/3
>>>>>> of
>>>>>> the index size. Then, with the implementation you suggest, I think
>>>>> (sorry
>>>>>> I
>>>>>> didn't took time to see you patch) that every time a lucene  
>>>>>> data need
>>>>> to
>>>>>> be
>>>>>> read, it is decrypted each time. With an encrypted FS, your kernel
>>>>> will
>>>>>> maintain a cache in RAM for you, so it won't hurt so much.
>>>>>> It needs some bench to see what is effectively the best, but I  
>>>>>> have
>>>>> doubt
>>>>>> that
>>>>>> your solution will be faster.
>>>>>>
>>>>>> Nicolas.
>>>>>
>>>>> Nicolas, I am all in favour of some tests to establish which  
>>>>> solution is
>>>>> best, but I have to say that I don't believe file system or  
>>>>> directory
>>>>> encryption in Lucene is really justified. Most operating system  
>>>>> already
>>>>> provide this feature, although they are system-wide or policy-based
>>>>> solution, hence not always within individual user control.
>>>>> But if the issue is user control, then I believe Lucene should  
>>>>> provide
>>>>> maximum granularity when it comes to choice of data to encrypt.
>>>>> The issue I believe is whether some form of encryption should be
>>>>> provided
>>>>> within Lucene to enable application developers to create  
>>>>> applications
>>>>> which
>>>>> offer some data protection under user control, with a minimum of  
>>>>> impact,
>>>>> where by impact I mean both on peformance and workload either in  
>>>>> Lucene
>>>>> code or user code.
>>>>
>>>> In fact you mean a user that has no control of it's machine, and  
>>>> that
>> cannot
>>>> encrypt his partition. Here you will have the issue with the  
>>>> swap : Lucene
>>>> will decrypt the data in RAM, that can possibly pushed on the  
>>>> swap... I
>> know
>>>> this is extreme, but it's a security hole.
>>>>
>>>> -- 
>>>> Nicolas LALEV�E
>>>> Solutions & Technologies
>>>> ANYWARE TECHNOLOGIES
>>>> Tel : +33 (0)5 61 00 52 90
>>>> Fax : +33 (0)5 61 00 51 46
>>>> http://www.anyware-tech.com
>>>>
>>>> -------------------------------------------------------------------- 
>>>> -
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/Attached- 
>> proposed-modifications-to-Lucene-2.0-to-support- 
>> Field.Store.Encrypted-tf2727614.html#a7645198
>> Sent from the Lucene - Java Developer mailing list archive at  
>> Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7657011
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by robert engels <re...@ix.netcom.com>.
I think you misunderstood. If you do not have encrypted swap (like  
OSX provides for) then you encryption is pointless as anyone can  
inspect the data as it it loaded into the heap by lucene - bypassing  
the encryption.

I also think you underestimated the impact on the size of the  
indexes, as most secure encryption schemes are going to pad the  
payloads to a minimum of 128 bits, and usually much more.

This is going to make a HUGE difference in the size of the index.

On Dec 1, 2006, at 2:00 PM, negrinv wrote:

>
> Good news for OSX users! but what about all the others, should I  
> say the
> majority??
> One more reason for encrypting at field level.
> Victor
>
>
> Robert Engels wrote:
>>
>> Not if running under OSX with encrypted swap turned on ! :)
>>
>> -----Original Message-----
>>> From: Nicolas Lalev�e <ni...@anyware-tech.com>
>>> Sent: Dec 1, 2006 4:49 AM
>>> To: java-dev@lucene.apache.org
>>> Subject: Re: Attached proposed modifications to Lucene 2.0 to  
>>> support
> Field.Store.Encrypted
>>>
>>> Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
>>>> Nicolas Lalev�e-2 wrote:
>>>>> Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
>>>>>> Thank you Robert for your commnets. I am inclined to agree  
>>>>>> with you,
>>>> but
>>>>>> I
>>>>>> would like to establish first of all if simplicity of  
>>>>>> implementation
>>>> is
>>>>>> the
>>>>>> overriding consideration. But before I dwell on that let me  
>>>>>> say that
>>>> i
>>>>>> have
>>>>>> discovered that I am not a master of DIFF file creation with  
>>>>>> Eclipse.
>>>>>> The diff file attachement to my original posting is absurdly  
>>>>>> large
>>>> and
>>>>>> not correct. I have therefore attached a zip file containing the
>>>>>> complete source code of the classes I modified. I leave it to  
>>>>>> others
>>>> to
>>>>>> extract the
>>>>>> diffs properly.
>>>>>> Back to the issue. So far the implementation has not been  
>>>>>> difficult
>>>>>> considering that I knew nothing about Lucene internals before I
>>>> started.
>>>>>> The reason is that Lucene is very well structured and the changes
>>>> just
>>>>>> fitted nicely by adding some code in the right place with minimal
>>>>>> changes to the existing code. But I admit that the proposed
>>>>>> implementation so far is not complete and more work is  
>>>>>> required to
>>>>>> overcome some of its restrictions. While I like your idea I  
>>>>>> believe
>>>> that
>>>>>> it imposed too large a
>>>>>> granularity on the encrypted data, all fields will all kinds  
>>>>>> of data
>>>>>> will be encrypted including  images and others which normally  
>>>>>> would
>>>> be
>>>>>> left alone, thus adding to the performance penalty due to  
>>>>>> encryption.
>>>>>
>>>>> I don't agree with you here. In Lucene, you will encrypt the field
>>>> data,
>>>>> the
>>>>> field names, and the tokens : I would say that is represents at  
>>>>> least
>>>> 2/3
>>>>> of
>>>>> the index size. Then, with the implementation you suggest, I think
>>>> (sorry
>>>>> I
>>>>> didn't took time to see you patch) that every time a lucene  
>>>>> data need
>>>> to
>>>>> be
>>>>> read, it is decrypted each time. With an encrypted FS, your kernel
>>>> will
>>>>> maintain a cache in RAM for you, so it won't hurt so much.
>>>>> It needs some bench to see what is effectively the best, but I  
>>>>> have
>>>> doubt
>>>>> that
>>>>> your solution will be faster.
>>>>>
>>>>> Nicolas.
>>>>
>>>> Nicolas, I am all in favour of some tests to establish which  
>>>> solution is
>>>> best, but I have to say that I don't believe file system or  
>>>> directory
>>>> encryption in Lucene is really justified. Most operating system  
>>>> already
>>>> provide this feature, although they are system-wide or policy-based
>>>> solution, hence not always within individual user control.
>>>> But if the issue is user control, then I believe Lucene should  
>>>> provide
>>>> maximum granularity when it comes to choice of data to encrypt.
>>>> The issue I believe is whether some form of encryption should be
>>>> provided
>>>> within Lucene to enable application developers to create  
>>>> applications
>>>> which
>>>> offer some data protection under user control, with a minimum of  
>>>> impact,
>>>> where by impact I mean both on peformance and workload either in  
>>>> Lucene
>>>> code or user code.
>>>
>>> In fact you mean a user that has no control of it's machine, and  
>>> that
> cannot
>>> encrypt his partition. Here you will have the issue with the  
>>> swap : Lucene
>>> will decrypt the data in RAM, that can possibly pushed on the  
>>> swap... I
> know
>>> this is extreme, but it's a security hole.
>>>
>>> -- 
>>> Nicolas LALEV�E
>>> Solutions & Technologies
>>> ANYWARE TECHNOLOGIES
>>> Tel : +33 (0)5 61 00 52 90
>>> Fax : +33 (0)5 61 00 51 46
>>> http://www.anyware-tech.com
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/Attached- 
> proposed-modifications-to-Lucene-2.0-to-support- 
> Field.Store.Encrypted-tf2727614.html#a7645198
> Sent from the Lucene - Java Developer mailing list archive at  
> Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Chris Hostetter <ho...@fucit.org>.
: dismay I noticed that JIRA assigned licence to the ASF for the provider
: software. something which I did not intend and which cannot be valid. Can it
: be reversed please?)


the flag can't be modified, but attachments can be deleted, which i have
done for the jar in question.

feel free to reattach without granting license to ASF for inclusion.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
Otis, I have raised JIRA Lucene-737. Please note a couple of points. The text
of the issue is a cut and paste of my original posting, which did not
include the encryption provider software, and attached the Lucene
modifications in diff form. The modifications attached to the JIRA are not
in diff form, and the encryption provider software is attached. (To my
dismay I noticed that JIRA assigned licence to the ASF for the provider
software. something which I did not intend and which cannot be valid. Can it
be reversed please?)


Otis Gospodnetic wrote:
> 
> Victor,
> 
> I haven't looked at your patch/ZIP.  It would be best to attach to a new
> JIRA issue.  That will be easier for others to look at (I already don't
> have the email with your ZIP, for example).  Also, a patch is stroooongly
> preferred if you've made changes to existing code.  If you can't do it
> from Eclipse, do it from the command-line: svn diff src, or some such.
> 
> Otis
> 
> ----- Original Message ----
> From: negrinv <vi...@gmail.com>
> To: java-dev@lucene.apache.org
> Sent: Friday, December 1, 2006 5:10:47 AM
> Subject: Re: Attached proposed modifications to Lucene 2.0 to support
> Field.Store.Encrypted
> 
> 
> 
> Nicolas Lalevée-2 wrote:
>> 
>> Le Vendredi 1 Décembre 2006 01:33, negrinv a écrit :
>>> Thank you Robert for your commnets. I am inclined to agree with you, but
>>> I
>>> would like to establish first of all if simplicity of implementation is
>>> the
>>> overriding consideration. But before I dwell on that let me say that i
>>> have
>>> discovered that I am not a master of DIFF file creation with Eclipse.
>>> The
>>> diff file attachement to my original posting is absurdly large and not
>>> correct. I have therefore attached a zip file containing the complete
>>> source code of the classes I modified. I leave it to others to extract
>>> the
>>> diffs properly.
>>> Back to the issue. So far the implementation has not been difficult
>>> considering that I knew nothing about Lucene internals before I started.
>>> The reason is that Lucene is very well structured and the changes just
>>> fitted nicely by adding some code in the right place with minimal
>>> changes
>>> to the existing code. But I admit that the proposed implementation so
>>> far
>>> is not complete and more work is required to overcome some of its
>>> restrictions. While I like your idea I believe that it imposed too large
>>> a
>>> granularity on the encrypted data, all fields will all kinds of data
>>> will
>>> be encrypted including  images and others which normally would be left
>>> alone, thus adding to the performance penalty due to encryption.
>> 
>> I don't agree with you here. In Lucene, you will encrypt the field data,
>> the 
>> field names, and the tokens : I would say that is represents at least 2/3
>> of 
>> the index size. Then, with the implementation you suggest, I think (sorry
>> I 
>> didn't took time to see you patch) that every time a lucene data need to
>> be 
>> read, it is decrypted each time. With an encrypted FS, your kernel will 
>> maintain a cache in RAM for you, so it won't hurt so much.
>> It needs some bench to see what is effectively the best, but I have doubt
>> that 
>> your solution will be faster.
>> 
>> Nicolas.
>> 
>> 
>> 
> 
> Nicolas, I am all in favour of some tests to establish which solution is
> best, but I have to say that I don't believe file system or directory
> encryption in Lucene is really justified. Most operating system already
> provide this feature, although they are system-wide or policy-based
> solution, hence not always within individual user control. 
> But if the issue is user control, then I believe Lucene should provide
> maximum granularity when it comes to choice of data to encrypt.
> The issue I believe is whether some form of encryption should be provided
> within Lucene to enable application developers to create applications
> which
> offer some data protection under user control, with a minimum of impact,
> where by impact I mean both on peformance and workload either in Lucene
> code
> or user code.
> I cannot test the performance issues until there is an alternative
> solution
> in place. If you have one and you can make it available I will be happy to
> give it an impartial test.
> Victor
> 
> -- 
> View this message in context:
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7636352
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7647159
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
Good news for OSX users! but what about all the others, should I say the
majority?? 
One more reason for encrypting at field level.
Victor


Robert Engels wrote:
> 
> Not if running under OSX with encrypted swap turned on ! :)
> 
> -----Original Message-----
>>From: Nicolas Lalev�e <ni...@anyware-tech.com>
>>Sent: Dec 1, 2006 4:49 AM
>>To: java-dev@lucene.apache.org
>>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
Field.Store.Encrypted
>>
>>Le Vendredi 1 D�cembre 2006 11:10, negrinv a �crit�:
>>> Nicolas Lalev�e-2 wrote:
>>> > Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit :
>>> >> Thank you Robert for your commnets. I am inclined to agree with you,
>>> but
>>> >> I
>>> >> would like to establish first of all if simplicity of implementation
>>> is
>>> >> the
>>> >> overriding consideration. But before I dwell on that let me say that
>>> i
>>> >> have
>>> >> discovered that I am not a master of DIFF file creation with Eclipse.
>>> >> The diff file attachement to my original posting is absurdly large
>>> and
>>> >> not correct. I have therefore attached a zip file containing the
>>> >> complete source code of the classes I modified. I leave it to others
>>> to
>>> >> extract the
>>> >> diffs properly.
>>> >> Back to the issue. So far the implementation has not been difficult
>>> >> considering that I knew nothing about Lucene internals before I
>>> started.
>>> >> The reason is that Lucene is very well structured and the changes
>>> just
>>> >> fitted nicely by adding some code in the right place with minimal
>>> >> changes to the existing code. But I admit that the proposed
>>> >> implementation so far is not complete and more work is required to
>>> >> overcome some of its restrictions. While I like your idea I believe
>>> that
>>> >> it imposed too large a
>>> >> granularity on the encrypted data, all fields will all kinds of data
>>> >> will be encrypted including  images and others which normally would
>>> be
>>> >> left alone, thus adding to the performance penalty due to encryption.
>>> >
>>> > I don't agree with you here. In Lucene, you will encrypt the field
>>> data,
>>> > the
>>> > field names, and the tokens : I would say that is represents at least
>>> 2/3
>>> > of
>>> > the index size. Then, with the implementation you suggest, I think
>>> (sorry
>>> > I
>>> > didn't took time to see you patch) that every time a lucene data need
>>> to
>>> > be
>>> > read, it is decrypted each time. With an encrypted FS, your kernel
>>> will
>>> > maintain a cache in RAM for you, so it won't hurt so much.
>>> > It needs some bench to see what is effectively the best, but I have
>>> doubt
>>> > that
>>> > your solution will be faster.
>>> >
>>> > Nicolas.
>>>
>>> Nicolas, I am all in favour of some tests to establish which solution is
>>> best, but I have to say that I don't believe file system or directory
>>> encryption in Lucene is really justified. Most operating system already
>>> provide this feature, although they are system-wide or policy-based
>>> solution, hence not always within individual user control.
>>> But if the issue is user control, then I believe Lucene should provide
>>> maximum granularity when it comes to choice of data to encrypt.
>>> The issue I believe is whether some form of encryption should be
>>> provided
>>> within Lucene to enable application developers to create applications
>>> which
>>> offer some data protection under user control, with a minimum of impact,
>>> where by impact I mean both on peformance and workload either in Lucene
>>> code or user code.
>>
>>In fact you mean a user that has no control of it's machine, and that
cannot 
>>encrypt his partition. Here you will have the issue with the swap : Lucene 
>>will decrypt the data in RAM, that can possibly pushed on the swap... I
know 
>>this is extreme, but it's a security hole.
>>
>>-- 
>>Nicolas LALEV�E
>>Solutions & Technologies
>>ANYWARE TECHNOLOGIES
>>Tel : +33 (0)5 61 00 52 90
>>Fax : +33 (0)5 61 00 51 46
>>http://www.anyware-tech.com
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7645198
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
I think we should not make too many assumptions about performance until we
can test alternative solutions.
The small payload overhead will be amply offset in my opinion by the ability
to be very selective about what is being encrypted, as opposed to wholesale
encryption and decryption. Also we should look at performance in the larger
context of all the possible reasons why users might need encryption. A large
proportion may not be worried about performance at all. And in final
analysis any performance degradation is not going to be crippling, we are
probably talking about very small percentages, either way, which, as long as
they are known and made available, will enable users to make an informed
decision.
Victor


Robert Engels wrote:
> 
> I agree with Nicolas.
> 
> I think the overhead of decrypting such small payloads (I think it is also
> subject to an easy attack, and/or will increase index size dramtically in
> order to prevent such small encryption blocks) will have a serious impact
> on performance.
> 
> We use Lucene for indexing only and store the actual payloads elsewhere,
> so in our case your solution is not optimal for us.
> -----Original Message-----
>>From: Nicolas Lalev�e <ni...@anyware-tech.com>
>>Sent: Dec 1, 2006 2:20 AM
>>To: java-dev@lucene.apache.org
>>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
Field.Store.Encrypted
>>
>>Le Vendredi 1 D�cembre 2006 01:33, negrinv a �crit�:
>>> Thank you Robert for your commnets. I am inclined to agree with you, but
>>> I
>>> would like to establish first of all if simplicity of implementation is
>>> the
>>> overriding consideration. But before I dwell on that let me say that i
>>> have
>>> discovered that I am not a master of DIFF file creation with Eclipse.
>>> The
>>> diff file attachement to my original posting is absurdly large and not
>>> correct. I have therefore attached a zip file containing the complete
>>> source code of the classes I modified. I leave it to others to extract
>>> the
>>> diffs properly.
>>> Back to the issue. So far the implementation has not been difficult
>>> considering that I knew nothing about Lucene internals before I started.
>>> The reason is that Lucene is very well structured and the changes just
>>> fitted nicely by adding some code in the right place with minimal
>>> changes
>>> to the existing code. But I admit that the proposed implementation so
>>> far
>>> is not complete and more work is required to overcome some of its
>>> restrictions. While I like your idea I believe that it imposed too large
>>> a
>>> granularity on the encrypted data, all fields will all kinds of data
>>> will
>>> be encrypted including  images and others which normally would be left
>>> alone, thus adding to the performance penalty due to encryption.
>>
>>I don't agree with you here. In Lucene, you will encrypt the field data,
the 
>>field names, and the tokens : I would say that is represents at least 2/3
of 
>>the index size. Then, with the implementation you suggest, I think (sorry
I 
>>didn't took time to see you patch) that every time a lucene data need to
be 
>>read, it is decrypted each time. With an encrypted FS, your kernel will 
>>maintain a cache in RAM for you, so it won't hurt so much.
>>It needs some bench to see what is effectively the best, but I have doubt
that 
>>your solution will be faster.
>>
>>Nicolas.
>>
>>> Many 
>>> hardware devices and most operating systems already provide directory or
>>> file system encryption therefore that level of encryption appears to me
>>> an
>>> unnecessary addition to Lucene. Encryption at field level however is not
>>> provided by anything I know. The key in my opinion is to decide what is
>>> best from the end user point of view, but perhaps we need more
>>> discussion
>>> on this.
>>> Victor
>>>
>>> http://www.nabble.com/file/4390/LuceneEncryptionMods.zip
>>> LuceneEncryptionMods.zip
>>>
>>> Robert Engels wrote:
>>> > I think a simpler solution would be to create a EncryptedDirectory
>>> > implementation of Directory, which requires a password to open/modify
>>> the
>>> > directory.
>>> >
>>> > Far simpler, and if yuou are using encryption to begin with, you are
>>> > probably encrypting most of the data anyway.
>>> >
>>> > -----Original Message-----
>>> >
>>> >>From: negrinv <vi...@gmail.com>
>>> >>Sent: Nov 29, 2006 9:45 PM
>>> >>To: java-dev@lucene.apache.org
>>> >>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
>>>
>>> Field.Store.Encrypted
>>>
>>> >>Thank you Luke for your comments and the references you supplied. I
>>> read
>>> >>through them and reached the following conclusions. There seems to be
>>> a
>>> >>philosophical issue about the boundary between a user application and
>>> the
>>> >>Lucene API, where should one start and the other stop.
>>> >>The other issue is the significant difference between compression and
>>> >>encryption.
>>> >>As far as the first issue is concerned it is really a matter of
>>> personal
>>> >>choice and preference. My feeling is that as long as adding
>>> functionality
>>> >>does not impair the performance of the API as a whole, it makes sense
>>> to
>>>
>>> add
>>>
>>> >>it to Lucene and thus simplify the task of the application developer.
>>>
>>> After
>>>
>>> >>all, application developers do not have to use all the features of the
>>> >> API and always have the option of subclassing, writing a better
>>> version
>>> >> of it
>>>
>>> if
>>>
>>> >>they can, or writing the functionality as part of the application,
>>> even
>>> >> if the API provides that functionality already. The API is there to
>>> make
>>> >> life easier for those developers who want to use it, nobody "has" to
>>> use
>>> >> it. The second issue is more technical. Compression simply compresses
>>> >> the
>>>
>>> stored
>>>
>>> >>data to save storage. The index itself is not compressed therefore
>>>
>>> searching
>>>
>>> >>proceeds as normal. With encryption however you must encrypt the index
>>> as
>>> >>well as the stored data otherwise one could reconstruct the source
>>>
>>> document
>>>
>>> >>from the index and thus defeat the purpose of encryption. Correct me
>>> if I
>>>
>>> am
>>>
>>> >>wrong, but I think that encrypting the Lucene index is not easy to
>>> >> achieve from outside of Lucene, it implies re-writing as part of the
>>> >> application much code now part of Lucene (see issue number one
>>> above),
>>> >> hence my preference for including it as part of the Lucene API rather
>>> >> than as part
>>>
>>> of
>>>
>>> >>the application.
>>> >>Victor
>>> >>
>>> >>Luke Nezda wrote:
>>> >>> I think that adding encryption support to Lucene fields is a bad
>>> idea
>>> >>> for
>>> >>> the same reasons adding compression was a bad idea (conclusive
>>> comments
>>> >>> on
>>> >>> the tail of this  issue
>>> >>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
>>> >>> fields
>>> >>> can be used by users to achieve this end.  Maybe a contrib with
>>> utility
>>> >>> methods would be a compromise to preserve this work and make it
>>> >>> accessible
>>> >>> to others, or alternatively just a faq entry with the sample code or
>>> >>> references to it.
>>> >>> Luke
>>> >>>
>>> >>> On 11/29/06, negrinv <vi...@gmail.com> wrote:
>>> >>>> Attached are proposed modifications to Lucene 2.0 to support
>>> >>>> Field.Store.Encrypted.
>>> >>>> The rational behind this proposal is simple. Since Lucene can store
>>> >>>> data
>>> >>>> in
>>> >>>> the index, it effectively makes the data portable. It is
>>> conceivable
>>> >>>> that
>>> >>>> some of the data may be sensitive in nature, hence the option to
>>> >>>> encrypt
>>> >>>> it.
>>> >>>> Both the data and its index are encrypted in this implementation.
>>> >>>> This is only an initial implementation. It has the following
>>> several
>>> >>>> restrictions, all of which can be resolved if required, albeit with
>>> >>>> some
>>> >>>> effort and more changes to Lucene:
>>> >>>> 1) binary and compressed fields cannot be encrypted as well (a
>>> >>>> plaintext
>>> >>>> once encrypted becomes binary).
>>> >>>> 2) Field.Store.Encrypted implies Field.Store.Yes
>>> >>>> This makes sense but it forces one to store the data in the same
>>> index
>>> >>>> where
>>> >>>> the tokens are stored. It may be preferable at times to have two
>>> >>>> indeces,
>>> >>>> one for tokens, the other for the data.
>>> >>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This
>>> is
>>> >>>> an open
>>> >>>> source package, very simple to use which has the advantage of
>>> >>>> guaranteeing
>>> >>>> that the length of the encrypted field is the same as the original
>>> >>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in
>>> its
>>> >>>> Java
>>> >>>> Cryptography Extension, but unfortunately not in Java 1.4.
>>> >>>> The BouncyCastle RC4 is not the only algorythm available, others
>>> not
>>> >>>> depending on third party code can be used, but it was just the
>>> >>>> simplest to
>>> >>>> implement for this first attempt.
>>> >>>> 4) The attachements are modifications in diff form based on an
>>> early
>>> >>>> (I think August or September '06) repository snapshot of Lucene 2.0
>>> >>>> subsequently updated from the Lucene repository on 29/11/06. They
>>> may
>>> >>>> need
>>> >>>> some additional work to merge with the latest version in the Lucene
>>> >>>> repository. They also include a couple of JUnit test programs which
>>> >>>> explain,
>>> >>>> as well as test, the usage. You will need the BouncyCastle .jar
>>> >>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize
>>> >>>> the size
>>> >>>> of the attachements, but it can be downloaded free from:
>>> >>>> http://www.bouncycastle.org/latest_releases.html
>>> >>>>
>>> >>>> 5) Searching an encrypted field is restricted to single terms, no
>>> >>>> phrase
>>> >>>> or
>>> >>>> boolean searches allowed yet, and the term has to be encrypted by
>>> the
>>> >>>> application before searching it. (ref. attached JUnit test
>>> programs)
>>> >>>>
>>> >>>> To the extent that I have tested it, the code works as intended and
>>> >>>> does
>>> >>>> not
>>> >>>> appear to introduce any regression problems, but more testing by
>>> >>>> others would be desirable.
>>> >>>> I don't propose at this stage to do any further work with this API
>>> >>>> extensions unless there is some expression of interest and
>>> direction
>>> >>>> from
>>> >>>> the Lucene Developers team. I have an application ready to roll
>>> which
>>> >>>> uses
>>> >>>> the proposed Lucene encryption API additions (please see
>>> >>>> http://www.kbforge.com/index.html). The application is not yet
>>> >>>> available
>>> >>>> for
>>> >>>> downloading simply because I am not sure if the Lucene licence
>>> allows
>>> >>>> me
>>> >>>> to
>>> >>>> do so. I would appreciate your advice in this regard. My
>>> application
>>> >>>> is free
>>> >>>> but its source code is not available (yet). I should add that
>>> >>>> encryption
>>> >>>> does not have to be an integral part of Lucene, it can be just part
>>> of
>>> >>>> the
>>> >>>> end application, but somehow it seems to me that
>>> Field.Store.Encrypted
>>> >>>> belongs in the same category as compression and binary values.
>>> >>>> I would be happy to receive your feedback.
>>> >>>>
>>> >>>> victor negrin
>>> >>>>
>>> >>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>>> >>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>>> >>>> TestEncryptedDocument.java
>>> >>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>>> >>>> --
>>> >>>> View this message in context:
>>> >>>>
>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to
>>> >>>>-support-Field.Store.Encrypted-tf2727614.html#a7607415 Sent from the
>>> >>>> Lucene - Java Developer mailing list archive at Nabble.com.
>>> >>>>
>>> >>>>
>>> >>>>
>>> ---------------------------------------------------------------------
>>> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >>
>>> >>--
>>> >>View this message in context:
>>>
>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-supp
>>>ort-Field.Store.Encrypted-tf2727614.html#a7613046
>>>
>>> >>Sent from the Lucene - Java Developer mailing list archive at
>>> Nabble.com.
>>> >>
>>> >>
>>> >>---------------------------------------------------------------------
>>> >>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> >>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> > For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>-- 
>>Nicolas LALEV�E
>>Solutions & Technologies
>>ANYWARE TECHNOLOGIES
>>Tel : +33 (0)5 61 00 52 90
>>Fax : +33 (0)5 61 00 51 46
>>http://www.anyware-tech.com
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7645184
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
I am delighted I am no longer a voice in the wilderness. I couldn't agree
more with you Joaquin!
Victor

Joaquin Delgado-2 wrote:
> 
> Security should be responsibility of the application. However let's make
> it clear that field level encryption is more a "means of" implementing
> security and herefore an infrastructure functionality that in my opinion
> Lucene should optionally provide. In the same way relational databases
> provide options (at a performance cost) to encrypt column values for very
> sensitive information.
> 
> That way, even if the data fell in obscure hands and they know how to read
> the lucene index, they would not be able to see the values stored with it.
> 
> Think about credit card numbers for example ;-)
> 
> That's my two cents.
> 
> -- Joaquin
> 
> 
> -----Original Message-----
> From negrinv <vi...@gmail.com>
> Sent Fri 12/1/2006 1:22 PM
> To java-dev@lucene.apache.org
> Subject Re: Attached proposed modifications to Lucene 2.0 to support
> Field.Store.Encrypted
> 
> 
> That is a valid consideration Doron, which brings the discussion back to
> the
> difference between encrypton and security. I believe that security is an
> end
> application responsability, not Lucene's. For instance, is it possible to
> write the end application so that those stats are hidden from or
> inaccessible to users?
> Victor
> 
> 
> Doron Cohen wrote:
>> 
>> Robert Engels <re...@ix.netcom.com> wrote on 01/12/2006 09:34:12:
>>> ... decrypting such small payloads .. I think it is also subject to an
>> easy attack,
>> 
>> In addition, index statistics are still available, right?  So one can
>> know
>> how many docs, which (encrypted) words appear in which docs and exactly
>> where, and how often.  AFAIK, with a large enough index these statistics
>> can be useful for cracking the encryption.
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7646459
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7647866
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by JOAQUIN DELGADO <JO...@ORACLE.COM>.
Security should be responsibility of the application. However let's make it clear that field level encryption is more a "means of" implementing security and herefore an infrastructure functionality that in my opinion Lucene should optionally provide. In the same way relational databases provide options (at a performance cost) to encrypt column values for very sensitive information.

That way, even if the data fell in obscure hands and they know how to read the lucene index, they would not be able to see the values stored with it.

Think about credit card numbers for example ;-)

That's my two cents.

-- Joaquin


-----Original Message-----
>From negrinv <vi...@gmail.com>
Sent Fri 12/1/2006 1:22 PM
To java-dev@lucene.apache.org
Subject Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted


That is a valid consideration Doron, which brings the discussion back to the
difference between encrypton and security. I believe that security is an end
application responsability, not Lucene's. For instance, is it possible to
write the end application so that those stats are hidden from or
inaccessible to users?
Victor


Doron Cohen wrote:
> 
> Robert Engels <re...@ix.netcom.com> wrote on 01/12/2006 09:34:12:
>> ... decrypting such small payloads .. I think it is also subject to an
> easy attack,
> 
> In addition, index statistics are still available, right?  So one can know
> how many docs, which (encrypted) words appear in which docs and exactly
> where, and how often.  AFAIK, with a large enough index these statistics
> can be useful for cracking the encryption.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7646459
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Chris Hostetter <ho...@fucit.org>.
: That is a valid consideration Doron, which brings the discussion back to the
: difference between encrypton and security. I believe that security is an end
: application responsability, not Lucene's. For instance, is it possible to
: write the end application so that those stats are hidden from or
: inaccessible to users?

if you are relying on the application to provide the security (hiding the
stats that can allow you to "guess" at teh encrypted terms) then why no
rely on the application to provide the encryption as well?

you can't claim you don't trust someone and then give them your car
keys...

if the goal is to encrypt both stored fields and indexed term enums in an
index in such a way that a malicious application can't obtain the
orriginal field values or Terms then the Directory itself needs to be
encrypted (use either the lucene definition of Directory, or a filesystem
defition -- either one is fine) ... if you have faith that no malicious
applications will ever have access to your Directory then the Lucene API
layer doesn't need to care wether your data is encrypted or not -- and you
don't need to go to the effort of encrypting the TermEnums either ... let
your trust Indexing code read in the encrypted data, decrypt it, and use
IndexWriter as is, let your trusted Searching code use the IndexSearcher
as is to read plain text info and encrypt the data before returning it to
clients.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
That is a valid consideration Doron, which brings the discussion back to the
difference between encrypton and security. I believe that security is an end
application responsability, not Lucene's. For instance, is it possible to
write the end application so that those stats are hidden from or
inaccessible to users?
Victor


Doron Cohen wrote:
> 
> Robert Engels <re...@ix.netcom.com> wrote on 01/12/2006 09:34:12:
>> ... decrypting such small payloads .. I think it is also subject to an
> easy attack,
> 
> In addition, index statistics are still available, right?  So one can know
> how many docs, which (encrypted) words appear in which docs and exactly
> where, and how often.  AFAIK, with a large enough index these statistics
> can be useful for cracking the encryption.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7646459
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Nicolas Lalevée <ni...@anyware-tech.com>.
Le Vendredi 1 Décembre 2006 11:10, negrinv a écrit :
> Nicolas Lalevée-2 wrote:
> > Le Vendredi 1 Décembre 2006 01:33, negrinv a écrit :
> >> Thank you Robert for your commnets. I am inclined to agree with you, but
> >> I
> >> would like to establish first of all if simplicity of implementation is
> >> the
> >> overriding consideration. But before I dwell on that let me say that i
> >> have
> >> discovered that I am not a master of DIFF file creation with Eclipse.
> >> The diff file attachement to my original posting is absurdly large and
> >> not correct. I have therefore attached a zip file containing the
> >> complete source code of the classes I modified. I leave it to others to
> >> extract the
> >> diffs properly.
> >> Back to the issue. So far the implementation has not been difficult
> >> considering that I knew nothing about Lucene internals before I started.
> >> The reason is that Lucene is very well structured and the changes just
> >> fitted nicely by adding some code in the right place with minimal
> >> changes to the existing code. But I admit that the proposed
> >> implementation so far is not complete and more work is required to
> >> overcome some of its restrictions. While I like your idea I believe that
> >> it imposed too large a
> >> granularity on the encrypted data, all fields will all kinds of data
> >> will be encrypted including  images and others which normally would be
> >> left alone, thus adding to the performance penalty due to encryption.
> >
> > I don't agree with you here. In Lucene, you will encrypt the field data,
> > the
> > field names, and the tokens : I would say that is represents at least 2/3
> > of
> > the index size. Then, with the implementation you suggest, I think (sorry
> > I
> > didn't took time to see you patch) that every time a lucene data need to
> > be
> > read, it is decrypted each time. With an encrypted FS, your kernel will
> > maintain a cache in RAM for you, so it won't hurt so much.
> > It needs some bench to see what is effectively the best, but I have doubt
> > that
> > your solution will be faster.
> >
> > Nicolas.
>
> Nicolas, I am all in favour of some tests to establish which solution is
> best, but I have to say that I don't believe file system or directory
> encryption in Lucene is really justified. Most operating system already
> provide this feature, although they are system-wide or policy-based
> solution, hence not always within individual user control.
> But if the issue is user control, then I believe Lucene should provide
> maximum granularity when it comes to choice of data to encrypt.
> The issue I believe is whether some form of encryption should be provided
> within Lucene to enable application developers to create applications which
> offer some data protection under user control, with a minimum of impact,
> where by impact I mean both on peformance and workload either in Lucene
> code or user code.

In fact you mean a user that has no control of it's machine, and that cannot 
encrypt his partition. Here you will have the issue with the swap : Lucene 
will decrypt the data in RAM, that can possibly pushed on the swap... I know 
this is extreme, but it's a security hole.

-- 
Nicolas LALEVÉE
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.

Nicolas Lalevée-2 wrote:
> 
> Le Vendredi 1 Décembre 2006 01:33, negrinv a écrit :
>> Thank you Robert for your commnets. I am inclined to agree with you, but
>> I
>> would like to establish first of all if simplicity of implementation is
>> the
>> overriding consideration. But before I dwell on that let me say that i
>> have
>> discovered that I am not a master of DIFF file creation with Eclipse. The
>> diff file attachement to my original posting is absurdly large and not
>> correct. I have therefore attached a zip file containing the complete
>> source code of the classes I modified. I leave it to others to extract
>> the
>> diffs properly.
>> Back to the issue. So far the implementation has not been difficult
>> considering that I knew nothing about Lucene internals before I started.
>> The reason is that Lucene is very well structured and the changes just
>> fitted nicely by adding some code in the right place with minimal changes
>> to the existing code. But I admit that the proposed implementation so far
>> is not complete and more work is required to overcome some of its
>> restrictions. While I like your idea I believe that it imposed too large
>> a
>> granularity on the encrypted data, all fields will all kinds of data will
>> be encrypted including  images and others which normally would be left
>> alone, thus adding to the performance penalty due to encryption.
> 
> I don't agree with you here. In Lucene, you will encrypt the field data,
> the 
> field names, and the tokens : I would say that is represents at least 2/3
> of 
> the index size. Then, with the implementation you suggest, I think (sorry
> I 
> didn't took time to see you patch) that every time a lucene data need to
> be 
> read, it is decrypted each time. With an encrypted FS, your kernel will 
> maintain a cache in RAM for you, so it won't hurt so much.
> It needs some bench to see what is effectively the best, but I have doubt
> that 
> your solution will be faster.
> 
> Nicolas.
> 
> 
> 

Nicolas, I am all in favour of some tests to establish which solution is
best, but I have to say that I don't believe file system or directory
encryption in Lucene is really justified. Most operating system already
provide this feature, although they are system-wide or policy-based
solution, hence not always within individual user control. 
But if the issue is user control, then I believe Lucene should provide
maximum granularity when it comes to choice of data to encrypt.
The issue I believe is whether some form of encryption should be provided
within Lucene to enable application developers to create applications which
offer some data protection under user control, with a minimum of impact,
where by impact I mean both on peformance and workload either in Lucene code
or user code.
I cannot test the performance issues until there is an alternative solution
in place. If you have one and you can make it available I will be happy to
give it an impartial test.
Victor

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7636352
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Nicolas Lalevée <ni...@anyware-tech.com>.
Le Vendredi 1 Décembre 2006 01:33, negrinv a écrit :
> Thank you Robert for your commnets. I am inclined to agree with you, but I
> would like to establish first of all if simplicity of implementation is the
> overriding consideration. But before I dwell on that let me say that i have
> discovered that I am not a master of DIFF file creation with Eclipse. The
> diff file attachement to my original posting is absurdly large and not
> correct. I have therefore attached a zip file containing the complete
> source code of the classes I modified. I leave it to others to extract the
> diffs properly.
> Back to the issue. So far the implementation has not been difficult
> considering that I knew nothing about Lucene internals before I started.
> The reason is that Lucene is very well structured and the changes just
> fitted nicely by adding some code in the right place with minimal changes
> to the existing code. But I admit that the proposed implementation so far
> is not complete and more work is required to overcome some of its
> restrictions. While I like your idea I believe that it imposed too large a
> granularity on the encrypted data, all fields will all kinds of data will
> be encrypted including  images and others which normally would be left
> alone, thus adding to the performance penalty due to encryption.

I don't agree with you here. In Lucene, you will encrypt the field data, the 
field names, and the tokens : I would say that is represents at least 2/3 of 
the index size. Then, with the implementation you suggest, I think (sorry I 
didn't took time to see you patch) that every time a lucene data need to be 
read, it is decrypted each time. With an encrypted FS, your kernel will 
maintain a cache in RAM for you, so it won't hurt so much.
It needs some bench to see what is effectively the best, but I have doubt that 
your solution will be faster.

Nicolas.

> Many 
> hardware devices and most operating systems already provide directory or
> file system encryption therefore that level of encryption appears to me an
> unnecessary addition to Lucene. Encryption at field level however is not
> provided by anything I know. The key in my opinion is to decide what is
> best from the end user point of view, but perhaps we need more discussion
> on this.
> Victor
>
> http://www.nabble.com/file/4390/LuceneEncryptionMods.zip
> LuceneEncryptionMods.zip
>
> Robert Engels wrote:
> > I think a simpler solution would be to create a EncryptedDirectory
> > implementation of Directory, which requires a password to open/modify the
> > directory.
> >
> > Far simpler, and if yuou are using encryption to begin with, you are
> > probably encrypting most of the data anyway.
> >
> > -----Original Message-----
> >
> >>From: negrinv <vi...@gmail.com>
> >>Sent: Nov 29, 2006 9:45 PM
> >>To: java-dev@lucene.apache.org
> >>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
>
> Field.Store.Encrypted
>
> >>Thank you Luke for your comments and the references you supplied. I read
> >>through them and reached the following conclusions. There seems to be a
> >>philosophical issue about the boundary between a user application and the
> >>Lucene API, where should one start and the other stop.
> >>The other issue is the significant difference between compression and
> >>encryption.
> >>As far as the first issue is concerned it is really a matter of personal
> >>choice and preference. My feeling is that as long as adding functionality
> >>does not impair the performance of the API as a whole, it makes sense to
>
> add
>
> >>it to Lucene and thus simplify the task of the application developer.
>
> After
>
> >>all, application developers do not have to use all the features of the
> >> API and always have the option of subclassing, writing a better version
> >> of it
>
> if
>
> >>they can, or writing the functionality as part of the application, even
> >> if the API provides that functionality already. The API is there to make
> >> life easier for those developers who want to use it, nobody "has" to use
> >> it. The second issue is more technical. Compression simply compresses
> >> the
>
> stored
>
> >>data to save storage. The index itself is not compressed therefore
>
> searching
>
> >>proceeds as normal. With encryption however you must encrypt the index as
> >>well as the stored data otherwise one could reconstruct the source
>
> document
>
> >>from the index and thus defeat the purpose of encryption. Correct me if I
>
> am
>
> >>wrong, but I think that encrypting the Lucene index is not easy to
> >> achieve from outside of Lucene, it implies re-writing as part of the
> >> application much code now part of Lucene (see issue number one above),
> >> hence my preference for including it as part of the Lucene API rather
> >> than as part
>
> of
>
> >>the application.
> >>Victor
> >>
> >>Luke Nezda wrote:
> >>> I think that adding encryption support to Lucene fields is a bad idea
> >>> for
> >>> the same reasons adding compression was a bad idea (conclusive comments
> >>> on
> >>> the tail of this  issue
> >>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
> >>> fields
> >>> can be used by users to achieve this end.  Maybe a contrib with utility
> >>> methods would be a compromise to preserve this work and make it
> >>> accessible
> >>> to others, or alternatively just a faq entry with the sample code or
> >>> references to it.
> >>> Luke
> >>>
> >>> On 11/29/06, negrinv <vi...@gmail.com> wrote:
> >>>> Attached are proposed modifications to Lucene 2.0 to support
> >>>> Field.Store.Encrypted.
> >>>> The rational behind this proposal is simple. Since Lucene can store
> >>>> data
> >>>> in
> >>>> the index, it effectively makes the data portable. It is conceivable
> >>>> that
> >>>> some of the data may be sensitive in nature, hence the option to
> >>>> encrypt
> >>>> it.
> >>>> Both the data and its index are encrypted in this implementation.
> >>>> This is only an initial implementation. It has the following several
> >>>> restrictions, all of which can be resolved if required, albeit with
> >>>> some
> >>>> effort and more changes to Lucene:
> >>>> 1) binary and compressed fields cannot be encrypted as well (a
> >>>> plaintext
> >>>> once encrypted becomes binary).
> >>>> 2) Field.Store.Encrypted implies Field.Store.Yes
> >>>> This makes sense but it forces one to store the data in the same index
> >>>> where
> >>>> the tokens are stored. It may be preferable at times to have two
> >>>> indeces,
> >>>> one for tokens, the other for the data.
> >>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is
> >>>> an open
> >>>> source package, very simple to use which has the advantage of
> >>>> guaranteeing
> >>>> that the length of the encrypted field is the same as the original
> >>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
> >>>> Java
> >>>> Cryptography Extension, but unfortunately not in Java 1.4.
> >>>> The BouncyCastle RC4 is not the only algorythm available, others not
> >>>> depending on third party code can be used, but it was just the
> >>>> simplest to
> >>>> implement for this first attempt.
> >>>> 4) The attachements are modifications in diff form based on an early
> >>>> (I think August or September '06) repository snapshot of Lucene 2.0
> >>>> subsequently updated from the Lucene repository on 29/11/06. They may
> >>>> need
> >>>> some additional work to merge with the latest version in the Lucene
> >>>> repository. They also include a couple of JUnit test programs which
> >>>> explain,
> >>>> as well as test, the usage. You will need the BouncyCastle .jar
> >>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize
> >>>> the size
> >>>> of the attachements, but it can be downloaded free from:
> >>>> http://www.bouncycastle.org/latest_releases.html
> >>>>
> >>>> 5) Searching an encrypted field is restricted to single terms, no
> >>>> phrase
> >>>> or
> >>>> boolean searches allowed yet, and the term has to be encrypted by the
> >>>> application before searching it. (ref. attached JUnit test programs)
> >>>>
> >>>> To the extent that I have tested it, the code works as intended and
> >>>> does
> >>>> not
> >>>> appear to introduce any regression problems, but more testing by
> >>>> others would be desirable.
> >>>> I don't propose at this stage to do any further work with this API
> >>>> extensions unless there is some expression of interest and direction
> >>>> from
> >>>> the Lucene Developers team. I have an application ready to roll which
> >>>> uses
> >>>> the proposed Lucene encryption API additions (please see
> >>>> http://www.kbforge.com/index.html). The application is not yet
> >>>> available
> >>>> for
> >>>> downloading simply because I am not sure if the Lucene licence allows
> >>>> me
> >>>> to
> >>>> do so. I would appreciate your advice in this regard. My application
> >>>> is free
> >>>> but its source code is not available (yet). I should add that
> >>>> encryption
> >>>> does not have to be an integral part of Lucene, it can be just part of
> >>>> the
> >>>> end application, but somehow it seems to me that Field.Store.Encrypted
> >>>> belongs in the same category as compression and binary values.
> >>>> I would be happy to receive your feedback.
> >>>>
> >>>> victor negrin
> >>>>
> >>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
> >>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
> >>>> TestEncryptedDocument.java
> >>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
> >>>> --
> >>>> View this message in context:
> >>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to
> >>>>-support-Field.Store.Encrypted-tf2727614.html#a7607415 Sent from the
> >>>> Lucene - Java Developer mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>--
> >>View this message in context:
>
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-supp
>ort-Field.Store.Encrypted-tf2727614.html#a7613046
>
> >>Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> >>
> >>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >>For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org

-- 
Nicolas LALEVÉE
Solutions & Technologies
ANYWARE TECHNOLOGIES
Tel : +33 (0)5 61 00 52 90
Fax : +33 (0)5 61 00 51 46
http://www.anyware-tech.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
Thank you Robert for your commnets. I am inclined to agree with you, but I
would like to establish first of all if simplicity of implementation is the
overriding consideration. But before I dwell on that let me say that i have
discovered that I am not a master of DIFF file creation with Eclipse. The
diff file attachement to my original posting is absurdly large and not
correct. I have therefore attached a zip file containing the complete source
code of the classes I modified. I leave it to others to extract the diffs
properly.
Back to the issue. So far the implementation has not been difficult
considering that I knew nothing about Lucene internals before I started. The
reason is that Lucene is very well structured and the changes just fitted
nicely by adding some code in the right place with minimal changes to the
existing code. But I admit that the proposed implementation so far is not
complete and more work is required to overcome some of its restrictions.
While I like your idea I believe that it imposed too large a granularity on
the encrypted data, all fields will all kinds of data will be encrypted
including  images and others which normally would be left alone, thus adding
to the performance penalty due to encryption. Many hardware devices and most
operating systems already provide directory or file system encryption
therefore that level of encryption appears to me an unnecessary addition to
Lucene. Encryption at field level however is not provided by anything I
know. The key in my opinion is to decide what is best from the end user
point of view, but perhaps we need more discussion on this. 
Victor

http://www.nabble.com/file/4390/LuceneEncryptionMods.zip
LuceneEncryptionMods.zip 


Robert Engels wrote:
> 
> I think a simpler solution would be to create a EncryptedDirectory
> implementation of Directory, which requires a password to open/modify the
> directory.
> 
> Far simpler, and if yuou are using encryption to begin with, you are
> probably encrypting most of the data anyway.
> 
> -----Original Message-----
>>From: negrinv <vi...@gmail.com>
>>Sent: Nov 29, 2006 9:45 PM
>>To: java-dev@lucene.apache.org
>>Subject: Re: Attached proposed modifications to Lucene 2.0 to support
Field.Store.Encrypted
>>
>>
>>Thank you Luke for your comments and the references you supplied. I read
>>through them and reached the following conclusions. There seems to be a
>>philosophical issue about the boundary between a user application and the
>>Lucene API, where should one start and the other stop.
>>The other issue is the significant difference between compression and
>>encryption.
>>As far as the first issue is concerned it is really a matter of personal
>>choice and preference. My feeling is that as long as adding functionality
>>does not impair the performance of the API as a whole, it makes sense to
add
>>it to Lucene and thus simplify the task of the application developer.
After
>>all, application developers do not have to use all the features of the API
>>and always have the option of subclassing, writing a better version of it
if
>>they can, or writing the functionality as part of the application, even if
>>the API provides that functionality already. The API is there to make life
>>easier for those developers who want to use it, nobody "has" to use it.
>>The second issue is more technical. Compression simply compresses the
stored
>>data to save storage. The index itself is not compressed therefore
searching
>>proceeds as normal. With encryption however you must encrypt the index as
>>well as the stored data otherwise one could reconstruct the source
document
>>from the index and thus defeat the purpose of encryption. Correct me if I
am
>>wrong, but I think that encrypting the Lucene index is not easy to achieve
>>from outside of Lucene, it implies re-writing as part of the application
>>much code now part of Lucene (see issue number one above), hence my
>>preference for including it as part of the Lucene API rather than as part
of
>>the application.
>>Victor
>>
>>
>>Luke Nezda wrote:
>>> 
>>> I think that adding encryption support to Lucene fields is a bad idea
>>> for
>>> the same reasons adding compression was a bad idea (conclusive comments
>>> on
>>> the tail of this  issue
>>> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
>>> fields
>>> can be used by users to achieve this end.  Maybe a contrib with utility
>>> methods would be a compromise to preserve this work and make it
>>> accessible
>>> to others, or alternatively just a faq entry with the sample code or
>>> references to it.
>>> Luke
>>> 
>>> On 11/29/06, negrinv <vi...@gmail.com> wrote:
>>>>
>>>>
>>>> Attached are proposed modifications to Lucene 2.0 to support
>>>> Field.Store.Encrypted.
>>>> The rational behind this proposal is simple. Since Lucene can store
>>>> data
>>>> in
>>>> the index, it effectively makes the data portable. It is conceivable
>>>> that
>>>> some of the data may be sensitive in nature, hence the option to
>>>> encrypt
>>>> it.
>>>> Both the data and its index are encrypted in this implementation.
>>>> This is only an initial implementation. It has the following several
>>>> restrictions, all of which can be resolved if required, albeit with
>>>> some
>>>> effort and more changes to Lucene:
>>>> 1) binary and compressed fields cannot be encrypted as well (a
>>>> plaintext
>>>> once encrypted becomes binary).
>>>> 2) Field.Store.Encrypted implies Field.Store.Yes
>>>> This makes sense but it forces one to store the data in the same index
>>>> where
>>>> the tokens are stored. It may be preferable at times to have two
>>>> indeces,
>>>> one for tokens, the other for the data.
>>>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
>>>> open
>>>> source package, very simple to use which has the advantage of
>>>> guaranteeing
>>>> that the length of the encrypted field is the same as the original
>>>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
>>>> Java
>>>> Cryptography Extension, but unfortunately not in Java 1.4.
>>>> The BouncyCastle RC4 is not the only algorythm available, others not
>>>> depending on third party code can be used, but it was just the simplest
>>>> to
>>>> implement for this first attempt.
>>>> 4) The attachements are modifications in diff form based on an early (I
>>>> think August or September '06) repository snapshot of Lucene 2.0
>>>> subsequently updated from the Lucene repository on 29/11/06. They may
>>>> need
>>>> some additional work to merge with the latest version in the Lucene
>>>> repository. They also include a couple of JUnit test programs which
>>>> explain,
>>>> as well as test, the usage. You will need the BouncyCastle .jar
>>>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
>>>> size
>>>> of the attachements, but it can be downloaded free from:
>>>> http://www.bouncycastle.org/latest_releases.html
>>>>
>>>> 5) Searching an encrypted field is restricted to single terms, no
>>>> phrase
>>>> or
>>>> boolean searches allowed yet, and the term has to be encrypted by the
>>>> application before searching it. (ref. attached JUnit test programs)
>>>>
>>>> To the extent that I have tested it, the code works as intended and
>>>> does
>>>> not
>>>> appear to introduce any regression problems, but more testing by others
>>>> would be desirable.
>>>> I don't propose at this stage to do any further work with this API
>>>> extensions unless there is some expression of interest and direction
>>>> from
>>>> the Lucene Developers team. I have an application ready to roll which
>>>> uses
>>>> the proposed Lucene encryption API additions (please see
>>>> http://www.kbforge.com/index.html). The application is not yet
>>>> available
>>>> for
>>>> downloading simply because I am not sure if the Lucene licence allows
>>>> me
>>>> to
>>>> do so. I would appreciate your advice in this regard. My application is
>>>> free
>>>> but its source code is not available (yet). I should add that
>>>> encryption
>>>> does not have to be an integral part of Lucene, it can be just part of
>>>> the
>>>> end application, but somehow it seems to me that Field.Store.Encrypted
>>>> belongs in the same category as compression and binary values.
>>>> I would be happy to receive your feedback.
>>>>
>>>> victor negrin
>>>>
>>>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>>>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>>>> TestEncryptedDocument.java
>>>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
>>>> Sent from the Lucene - Java Developer mailing list archive at
>>>> Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>
>>>>
>>> 
>>> 
>>
>>-- 
>>View this message in context:
http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046
>>Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7631251
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Luke Nezda <ln...@gmail.com>.
Victor-
Your point is well taken that a comprehensive encryption strategy is not
quite analogous to compression which is involves more than a transformation
of field values to a more compact form since it requires (at a minimum) all
data structures which comprise the index be encrypted too.  Maybe I spoke to
soon.

However, after considering this more, I think the scheme would need to be
quite invasive to provide good security.  I think just plugging in
encryption simplistically would be very vulnerable to side channel attacks.
It seems the attacker can get clear text terms encrypted via the particular
index's QueryParser implementation and eventually create a fairly complete
decryption lookup table using Lucene's  data structures, thus undermining
the security of the internal data structures (encrypted payloads would
potentially be unaffected (unless they corresponded to index Terms)).

Let's say this weakness is OK with you.  Using the current API, I think you
can achieve your ends by using encrypting binary field values and adding a
trailing org.apache.lucene.analysis.TokenFilter you use at index and query
time that encrypts and Base64 encodes its input (has to be a String).  This
would effectively give you an encrypted form of Lucene's internal data
structures.

In addition to my security concerns with the concept, I also still agree
with the related philosophical issues put forward to this point on the
related field compression topic.  It seems inevitable to me that if
encryption support were added, eventually, application developers will try
to sell Lucene developers on adding features to it in addition to supporting
and maintaining it (ala configurable compression quality factor).  A
configurable, encrypting Base64 TokenFilter would also be a cool contrib.

Luke

On 11/29/06, negrinv <vi...@gmail.com> wrote:
>
>
> Thank you Luke for your comments and the references you supplied. I read
> through them and reached the following conclusions. There seems to be a
> philosophical issue about the boundary between a user application and the
> Lucene API, where should one start and the other stop.
> The other issue is the significant difference between compression and
> encryption.
> As far as the first issue is concerned it is really a matter of personal
> choice and preference. My feeling is that as long as adding functionality
> does not impair the performance of the API as a whole, it makes sense to
> add
> it to Lucene and thus simplify the task of the application developer.
> After
> all, application developers do not have to use all the features of the API
> and always have the option of subclassing, writing a better version of it
> if
> they can, or writing the functionality as part of the application, even if
>
> the API provides that functionality already. The API is there to make life
> easier for those developers who want to use it, nobody "has" to use it.
> The second issue is more technical. Compression simply compresses the
> stored
> data to save storage. The index itself is not compressed therefore
> searching
> proceeds as normal. With encryption however you must encrypt the index as
> well as the stored data otherwise one could reconstruct the source
> document
> from the index and thus defeat the purpose of encryption. Correct me if I
> am
> wrong, but I think that encrypting the Lucene index is not easy to achieve
> from outside of Lucene, it implies re-writing as part of the application
> much code now part of Lucene (see issue number one above), hence my
> preference for including it as part of the Lucene API rather than as part
> of
> the application.
> Victor
>
>
> Luke Nezda wrote:
> >
> > I think that adding encryption support to Lucene fields is a bad idea
> for
> > the same reasons adding compression was a bad idea (conclusive comments
> on
> > the tail of this  issue
> > http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary
> fields
> > can be used by users to achieve this end.  Maybe a contrib with utility
> > methods would be a compromise to preserve this work and make it
> accessible
> > to others, or alternatively just a faq entry with the sample code or
> > references to it.
> > Luke
> >
> > On 11/29/06, negrinv <victornegrin@gmail.com > wrote:
> >>
> >>
> >> Attached are proposed modifications to Lucene 2.0 to support
> >> Field.Store.Encrypted.
> >> The rational behind this proposal is simple. Since Lucene can store
> data
> >> in
> >> the index, it effectively makes the data portable. It is conceivable
> that
> >> some of the data may be sensitive in nature, hence the option to
> encrypt
> >> it.
> >> Both the data and its index are encrypted in this implementation.
> >> This is only an initial implementation. It has the following several
> >> restrictions, all of which can be resolved if required, albeit with
> some
> >> effort and more changes to Lucene:
> >> 1) binary and compressed fields cannot be encrypted as well (a
> plaintext
> >> once encrypted becomes binary).
> >> 2) Field.Store.Encrypted implies Field.Store.Yes
> >> This makes sense but it forces one to store the data in the same index
> >> where
> >> the tokens are stored. It may be preferable at times to have two
> indeces,
> >> one for tokens, the other for the data.
> >> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
> >> open
> >> source package, very simple to use which has the advantage of
> >> guaranteeing
> >> that the length of the encrypted field is the same as the original
> >> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
> >> Java
> >> Cryptography Extension, but unfortunately not in Java 1.4.
> >> The BouncyCastle RC4 is not the only algorythm available, others not
> >> depending on third party code can be used, but it was just the simplest
> >> to
> >> implement for this first attempt.
> >> 4) The attachements are modifications in diff form based on an early (I
> >> think August or September '06) repository snapshot of Lucene 2.0
> >> subsequently updated from the Lucene repository on 29/11/06. They may
> >> need
> >> some additional work to merge with the latest version in the Lucene
> >> repository. They also include a couple of JUnit test programs which
> >> explain,
> >> as well as test, the usage. You will need the BouncyCastle .jar
> >> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
> >> size
> >> of the attachements, but it can be downloaded free from:
> >> http://www.bouncycastle.org/latest_releases.html
> >>
> >> 5) Searching an encrypted field is restricted to single terms, no
> phrase
> >> or
> >> boolean searches allowed yet, and the term has to be encrypted by the
> >> application before searching it. (ref. attached JUnit test programs)
> >>
> >> To the extent that I have tested it, the code works as intended and
> does
> >> not
> >> appear to introduce any regression problems, but more testing by others
>
> >> would be desirable.
> >> I don't propose at this stage to do any further work with this API
> >> extensions unless there is some expression of interest and direction
> from
> >> the Lucene Developers team. I have an application ready to roll which
> >> uses
> >> the proposed Lucene encryption API additions (please see
> >> http://www.kbforge.com/index.html). The application is not yet
> available
> >> for
> >> downloading simply because I am not sure if the Lucene licence allows
> me
> >> to
> >> do so. I would appreciate your advice in this regard. My application is
> >> free
> >> but its source code is not available (yet). I should add that
> encryption
> >> does not have to be an integral part of Lucene, it can be just part of
> >> the
> >> end application, but somehow it seems to me that Field.Store.Encrypted
> >> belongs in the same category as compression and binary values.
> >> I would be happy to receive your feedback.
> >>
> >> victor negrin
> >>
> >> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
> >> http://www.nabble.com/file/4377/TestEncryptedDocument.java
> >> TestEncryptedDocument.java
> >> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
> >> Sent from the Lucene - Java Developer mailing list archive at
> Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-dev-help@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by negrinv <vi...@gmail.com>.
Thank you Luke for your comments and the references you supplied. I read
through them and reached the following conclusions. There seems to be a
philosophical issue about the boundary between a user application and the
Lucene API, where should one start and the other stop.
The other issue is the significant difference between compression and
encryption.
As far as the first issue is concerned it is really a matter of personal
choice and preference. My feeling is that as long as adding functionality
does not impair the performance of the API as a whole, it makes sense to add
it to Lucene and thus simplify the task of the application developer. After
all, application developers do not have to use all the features of the API
and always have the option of subclassing, writing a better version of it if
they can, or writing the functionality as part of the application, even if
the API provides that functionality already. The API is there to make life
easier for those developers who want to use it, nobody "has" to use it.
The second issue is more technical. Compression simply compresses the stored
data to save storage. The index itself is not compressed therefore searching
proceeds as normal. With encryption however you must encrypt the index as
well as the stored data otherwise one could reconstruct the source document
from the index and thus defeat the purpose of encryption. Correct me if I am
wrong, but I think that encrypting the Lucene index is not easy to achieve
from outside of Lucene, it implies re-writing as part of the application
much code now part of Lucene (see issue number one above), hence my
preference for including it as part of the Lucene API rather than as part of
the application.
Victor


Luke Nezda wrote:
> 
> I think that adding encryption support to Lucene fields is a bad idea for
> the same reasons adding compression was a bad idea (conclusive comments on
> the tail of this  issue
> http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary fields
> can be used by users to achieve this end.  Maybe a contrib with utility
> methods would be a compromise to preserve this work and make it accessible
> to others, or alternatively just a faq entry with the sample code or
> references to it.
> Luke
> 
> On 11/29/06, negrinv <vi...@gmail.com> wrote:
>>
>>
>> Attached are proposed modifications to Lucene 2.0 to support
>> Field.Store.Encrypted.
>> The rational behind this proposal is simple. Since Lucene can store data
>> in
>> the index, it effectively makes the data portable. It is conceivable that
>> some of the data may be sensitive in nature, hence the option to encrypt
>> it.
>> Both the data and its index are encrypted in this implementation.
>> This is only an initial implementation. It has the following several
>> restrictions, all of which can be resolved if required, albeit with some
>> effort and more changes to Lucene:
>> 1) binary and compressed fields cannot be encrypted as well (a plaintext
>> once encrypted becomes binary).
>> 2) Field.Store.Encrypted implies Field.Store.Yes
>> This makes sense but it forces one to store the data in the same index
>> where
>> the tokens are stored. It may be preferable at times to have two indeces,
>> one for tokens, the other for the data.
>> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
>> open
>> source package, very simple to use which has the advantage of
>> guaranteeing
>> that the length of the encrypted field is the same as the original
>> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its
>> Java
>> Cryptography Extension, but unfortunately not in Java 1.4.
>> The BouncyCastle RC4 is not the only algorythm available, others not
>> depending on third party code can be used, but it was just the simplest
>> to
>> implement for this first attempt.
>> 4) The attachements are modifications in diff form based on an early (I
>> think August or September '06) repository snapshot of Lucene 2.0
>> subsequently updated from the Lucene repository on 29/11/06. They may
>> need
>> some additional work to merge with the latest version in the Lucene
>> repository. They also include a couple of JUnit test programs which
>> explain,
>> as well as test, the usage. You will need the BouncyCastle .jar
>> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
>> size
>> of the attachements, but it can be downloaded free from:
>> http://www.bouncycastle.org/latest_releases.html
>>
>> 5) Searching an encrypted field is restricted to single terms, no phrase
>> or
>> boolean searches allowed yet, and the term has to be encrypted by the
>> application before searching it. (ref. attached JUnit test programs)
>>
>> To the extent that I have tested it, the code works as intended and does
>> not
>> appear to introduce any regression problems, but more testing by others
>> would be desirable.
>> I don't propose at this stage to do any further work with this API
>> extensions unless there is some expression of interest and direction from
>> the Lucene Developers team. I have an application ready to roll which
>> uses
>> the proposed Lucene encryption API additions (please see
>> http://www.kbforge.com/index.html). The application is not yet available
>> for
>> downloading simply because I am not sure if the Lucene licence allows me
>> to
>> do so. I would appreciate your advice in this regard. My application is
>> free
>> but its source code is not available (yet). I should add that encryption
>> does not have to be an integral part of Lucene, it can be just part of
>> the
>> end application, but somehow it seems to me that Field.Store.Encrypted
>> belongs in the same category as compression and binary values.
>> I would be happy to receive your feedback.
>>
>> victor negrin
>>
>> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
>> http://www.nabble.com/file/4377/TestEncryptedDocument.java
>> TestEncryptedDocument.java
>> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
>> --
>> View this message in context:
>> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7613046
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Attached proposed modifications to Lucene 2.0 to support Field.Store.Encrypted

Posted by Luke Nezda <ln...@gmail.com>.
I think that adding encryption support to Lucene fields is a bad idea for
the same reasons adding compression was a bad idea (conclusive comments on
the tail of this  issue
http://issues.apache.org/jira/browse/LUCENE-648?page=all).  Binary fields
can be used by users to achieve this end.  Maybe a contrib with utility
methods would be a compromise to preserve this work and make it accessible
to others, or alternatively just a faq entry with the sample code or
references to it.
Luke

On 11/29/06, negrinv <vi...@gmail.com> wrote:
>
>
> Attached are proposed modifications to Lucene 2.0 to support
> Field.Store.Encrypted.
> The rational behind this proposal is simple. Since Lucene can store data
> in
> the index, it effectively makes the data portable. It is conceivable that
> some of the data may be sensitive in nature, hence the option to encrypt
> it.
> Both the data and its index are encrypted in this implementation.
> This is only an initial implementation. It has the following several
> restrictions, all of which can be resolved if required, albeit with some
> effort and more changes to Lucene:
> 1) binary and compressed fields cannot be encrypted as well (a plaintext
> once encrypted becomes binary).
> 2) Field.Store.Encrypted implies Field.Store.Yes
> This makes sense but it forces one to store the data in the same index
> where
> the tokens are stored. It may be preferable at times to have two indeces,
> one for tokens, the other for the data.
> 3) As implemented, it uses RC4 encryption from BouncyCastle. This is an
> open
> source package, very simple to use which has the advantage of guaranteeing
> that the length of the encrypted field is the same as the original
> plaintext. As of Java 1.5 (5.0) Sun provides an RC4 equivalent in its Java
> Cryptography Extension, but unfortunately not in Java 1.4.
> The BouncyCastle RC4 is not the only algorythm available, others not
> depending on third party code can be used, but it was just the simplest to
> implement for this first attempt.
> 4) The attachements are modifications in diff form based on an early (I
> think August or September '06) repository snapshot of Lucene 2.0
> subsequently updated from the Lucene repository on 29/11/06. They may need
> some additional work to merge with the latest version in the Lucene
> repository. They also include a couple of JUnit test programs which
> explain,
> as well as test, the usage. You will need the BouncyCastle .jar
> (bcprov-jdk14-134.jar) to run them. I did not attach it to minimize the
> size
> of the attachements, but it can be downloaded free from:
> http://www.bouncycastle.org/latest_releases.html
>
> 5) Searching an encrypted field is restricted to single terms, no phrase
> or
> boolean searches allowed yet, and the term has to be encrypted by the
> application before searching it. (ref. attached JUnit test programs)
>
> To the extent that I have tested it, the code works as intended and does
> not
> appear to introduce any regression problems, but more testing by others
> would be desirable.
> I don't propose at this stage to do any further work with this API
> extensions unless there is some expression of interest and direction from
> the Lucene Developers team. I have an application ready to roll which uses
> the proposed Lucene encryption API additions (please see
> http://www.kbforge.com/index.html). The application is not yet available
> for
> downloading simply because I am not sure if the Lucene licence allows me
> to
> do so. I would appreciate your advice in this regard. My application is
> free
> but its source code is not available (yet). I should add that encryption
> does not have to be an integral part of Lucene, it can be just part of the
> end application, but somehow it seems to me that Field.Store.Encrypted
> belongs in the same category as compression and binary values.
> I would be happy to receive your feedback.
>
> victor negrin
>
> http://www.nabble.com/file/4376/luceneDiff2.txt luceneDiff2.txt
> http://www.nabble.com/file/4377/TestEncryptedDocument.java
> TestEncryptedDocument.java
> http://www.nabble.com/file/4378/TestDocument.java TestDocument.java
> --
> View this message in context:
> http://www.nabble.com/Attached-proposed-modifications-to-Lucene-2.0-to-support-Field.Store.Encrypted-tf2727614.html#a7607415
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>