You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Hans Merkl <hm...@hmerkl.com> on 2010/02/25 22:16:55 UTC

How to secure/encrypt a Lucene index?

Hi, I am working on a desktop app that will use Lucerne as search engine.
The app will be installed on the user's machine and the index will be stored
on the local hard disk.

The data is potentially confidential so I would like to protect the index
from unauthorized access. The data needs to be secure even when the user's
machine gets stolen.

2 approaches I have come up with so far:

- Use Windows NTFS encryption. Should be secure unless the unauthorized
person knows how log in as the user that created the index.
- Use TrueCrypt. This should be very safe but it requires the installation
of TrueCrypt and administrative rights to install the encrypted drive.

The application will be distributed to many users so I would like to keep
the installation as simple as possible.

Does anybody have experience with this scenario? Right now I think the
easiest approach would be NTFS encryption. What do you think?

Thanks!




Re: How to secure/encrypt a Lucene index?

Posted by Shashi Kant <sk...@sloan.mit.edu>.
Perhaps the easiest thing to do is to encrypt the index folder using
an algorithm such as Blowfish and decrypt it on application launch and
load into a RAM Directory. I had implemented something similar in the
past and it worked out ok.



On Thu, Feb 25, 2010 at 7:18 PM, Nicholas Paldino [.NET/C# MVP]
<ca...@caspershouse.com> wrote:
> Hans,
>
>        While I've seen other responses here, you haven't indicates exactly
> ^what^ constitutes "unauthorized access".  Does that mean someone who can
> authenticate against the domain to be a certain user, or some other
> criteria?
>
>        You need to define the threats, the surface area for attacks, etc,
> etc.
>
>        It would seem like you want to use an additional shared-secret in
> order to access the data, which would mean that you have to query for this
> shared secret in your application, no matter what encryption technology is
> used.  If you are using passwords, then you need to enforce password
> strength; things such as minimum lengths, use of non-alpha-numeric
> characters, checks against frequency of characters in the password and
> dictionary checks should be standard.
>
>        Or, you could use a smart card with a client certificate as the
> shared secret, or any combination of things (biometrics, etc, etc).
>
>        The point is, until you define what you have, and what the
> requirements are ("protect the index from unauthorized index" is just way to
> vague), you're just stabbing in the dark.
>
>                - Nick
>
> -----Original Message-----
> From: Hans Merkl [mailto:hm@hmerkl.com]
> Sent: Thursday, February 25, 2010 4:17 PM
> To: lucene-net-user@lucene.apache.org
> Subject: How to secure/encrypt a Lucene index?
>
>
> Hi, I am working on a desktop app that will use Lucerne as search engine.
> The app will be installed on the user's machine and the index will be stored
> on the local hard disk.
>
> The data is potentially confidential so I would like to protect the index
> from unauthorized access. The data needs to be secure even when the user's
> machine gets stolen.
>
> 2 approaches I have come up with so far:
>
> - Use Windows NTFS encryption. Should be secure unless the unauthorized
> person knows how log in as the user that created the index.
> - Use TrueCrypt. This should be very safe but it requires the installation
> of TrueCrypt and administrative rights to install the encrypted drive.
>
> The application will be distributed to many users so I would like to keep
> the installation as simple as possible.
>
> Does anybody have experience with this scenario? Right now I think the
> easiest approach would be NTFS encryption. What do you think?
>
> Thanks!
>
>
>

RE: How to secure/encrypt a Lucene index?

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
Hans,

	While I've seen other responses here, you haven't indicates exactly
^what^ constitutes "unauthorized access".  Does that mean someone who can
authenticate against the domain to be a certain user, or some other
criteria?

	You need to define the threats, the surface area for attacks, etc,
etc.

	It would seem like you want to use an additional shared-secret in
order to access the data, which would mean that you have to query for this
shared secret in your application, no matter what encryption technology is
used.  If you are using passwords, then you need to enforce password
strength; things such as minimum lengths, use of non-alpha-numeric
characters, checks against frequency of characters in the password and
dictionary checks should be standard.

	Or, you could use a smart card with a client certificate as the
shared secret, or any combination of things (biometrics, etc, etc).

	The point is, until you define what you have, and what the
requirements are ("protect the index from unauthorized index" is just way to
vague), you're just stabbing in the dark. 

		- Nick

-----Original Message-----
From: Hans Merkl [mailto:hm@hmerkl.com] 
Sent: Thursday, February 25, 2010 4:17 PM
To: lucene-net-user@lucene.apache.org
Subject: How to secure/encrypt a Lucene index?


Hi, I am working on a desktop app that will use Lucerne as search engine.
The app will be installed on the user's machine and the index will be stored
on the local hard disk.

The data is potentially confidential so I would like to protect the index
from unauthorized access. The data needs to be secure even when the user's
machine gets stolen.

2 approaches I have come up with so far:

- Use Windows NTFS encryption. Should be secure unless the unauthorized
person knows how log in as the user that created the index.
- Use TrueCrypt. This should be very safe but it requires the installation
of TrueCrypt and administrative rights to install the encrypted drive.

The application will be distributed to many users so I would like to keep
the installation as simple as possible.

Does anybody have experience with this scenario? Right now I think the
easiest approach would be NTFS encryption. What do you think?

Thanks!



Re: How to secure/encrypt a Lucene index?

Posted by Nicholas Petersen <np...@gmail.com>.
<If you use the System.Cryptography functions on a word, it _should_ remain
the same when you run it a second time.  This means you could encrypt each
word, then do a search on the encrypted phrase.   This way, you'd have a
Lucene index of encrypted words.>

Just wanted to say, that's a great idea -- at least as a last resort.  <...
it _should_ remain the same when you run it a second time.> If it doesn't,
it's because the cryptography was bogus.  The index would however become
much larger.  This could be helped if you could store the encrypted words in
the native bye[] produced in encryption (perhaps chunk the array into
Int64s), but from my read of Lucene in Action, seems like they said
searching on types other than string isn't really supported (storage yes,
searching upon, no) ... but that could be wrong. Otherwise you'd need to
convert the encrypted byte[] into Base64.  See this example which becomes
4xs larger ("a word" with 6 chars to a 24 char result; these are functions
from the library DotNetExtensions www.dotnetextensions.com):

string encrypted = "a word".EncryptToBase64("secret!"); //
"DY9oTzZkaTxbXmbSH/+SfA==", 24 chars

byte[] encryptedBytes = "a word".Encrypt("secret!"); //encryptedBytes is a
lot shorter at 16, but still 2 1/2x larger....

Nick

On Thu, Feb 25, 2010 at 4:41 PM, Trevor Watson <tw...@datassimilate.com>wrote:

>
> Just throwing ideas your way in general,  If you use the
> System.Cryptography functions on a word, it _should_ remain the same when
> you run it a second time.  This means you could encrypt each word, then do a
> search on the encrypted phrase.   This way, you'd have a Lucene index of
> encrypted words.
>
> Then if you are using the Lucene index as the data store as well, you could
> potentially decrypt each field word by word as well.
>
>
> We have a program that temporarily stores data on a users computer using
> System.IO.IsolatedStorage.  If you could decrypt the entire Lucene index
> each time the program starts, you could store it in there and access it (so
> long as the index isn't overly large so the start-up decrypt doesn't take
> too long).
>
>    If the IsolatedStorage isn't working for a Lucene index you could create
> a RAM Disk or temporary disk on the drive to store the lucene index while
> accessing it, you could look at Dokan (dokan-dev.net) (there's a .NET
> wrapper for this) to create a temporary drive on your computer and decrypt
> the index into it.
>
> Trevor Watson
>
>
>
> On 2/25/2010 4:16 PM, Hans Merkl wrote:
>
>> Hi, I am working on a desktop app that will use Lucerne as search engine.
>> The app will be installed on the user's machine and the index will be
>> stored
>> on the local hard disk.
>>
>> The data is potentially confidential so I would like to protect the index
>> from unauthorized access. The data needs to be secure even when the user's
>> machine gets stolen.
>>
>> 2 approaches I have come up with so far:
>>
>> - Use Windows NTFS encryption. Should be secure unless the unauthorized
>> person knows how log in as the user that created the index.
>> - Use TrueCrypt. This should be very safe but it requires the installation
>> of TrueCrypt and administrative rights to install the encrypted drive.
>>
>> The application will be distributed to many users so I would like to keep
>> the installation as simple as possible.
>>
>> Does anybody have experience with this scenario? Right now I think the
>> easiest approach would be NTFS encryption. What do you think?
>>
>> Thanks!
>>
>>
>>
>>
>>
>
>

Re: How to secure/encrypt a Lucene index?

Posted by Trevor Watson <tw...@datassimilate.com>.
Just throwing ideas your way in general,  If you use the 
System.Cryptography functions on a word, it _should_ remain the same 
when you run it a second time.  This means you could encrypt each word, 
then do a search on the encrypted phrase.   This way, you'd have a 
Lucene index of encrypted words.

Then if you are using the Lucene index as the data store as well, you 
could potentially decrypt each field word by word as well.


We have a program that temporarily stores data on a users computer using 
System.IO.IsolatedStorage.  If you could decrypt the entire Lucene index 
each time the program starts, you could store it in there and access it 
(so long as the index isn't overly large so the start-up decrypt doesn't 
take too long).

     If the IsolatedStorage isn't working for a Lucene index you could 
create a RAM Disk or temporary disk on the drive to store the lucene 
index while accessing it, you could look at Dokan (dokan-dev.net) 
(there's a .NET wrapper for this) to create a temporary drive on your 
computer and decrypt the index into it.

Trevor Watson


On 2/25/2010 4:16 PM, Hans Merkl wrote:
> Hi, I am working on a desktop app that will use Lucerne as search engine.
> The app will be installed on the user's machine and the index will be stored
> on the local hard disk.
>
> The data is potentially confidential so I would like to protect the index
> from unauthorized access. The data needs to be secure even when the user's
> machine gets stolen.
>
> 2 approaches I have come up with so far:
>
> - Use Windows NTFS encryption. Should be secure unless the unauthorized
> person knows how log in as the user that created the index.
> - Use TrueCrypt. This should be very safe but it requires the installation
> of TrueCrypt and administrative rights to install the encrypted drive.
>
> The application will be distributed to many users so I would like to keep
> the installation as simple as possible.
>
> Does anybody have experience with this scenario? Right now I think the
> easiest approach would be NTFS encryption. What do you think?
>
> Thanks!
>
>
>
>