You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Andrew Schuler <an...@gmail.com> on 2010/02/26 18:16:11 UTC

Lucene index file container

The discussion about encrypting an index has me thinking about a current use
I have for Lucene.net. I'm building a small app with a static index
distributed with it. Can anyone recommend a way to package the index into
say some type of file container for inclusion in an installer package?

-andy

RE: Lucene index file container

Posted by Hans Merkl <hm...@hmerkl.com>.
You can put the files into a zip file and have your app unpack the files.
It's very easy to do this from a .NET app.

Just an idea.

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 2:47 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Yes, that is do-able. I was just thinking it would be cleaner to wrap the
indexes (there will be more than one) in some sort of file container. One of
the things I'd like to do it be able to allow the user to download
pre-packaged indexes and load them into the app. This would be easy with a
file than a directory of files no?


On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:

> Can't you add all the files in the index directory to the installer
> package?
> This should be pretty straightforward.
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 12:16 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Lucene index file container
>
> The discussion about encrypting an index has me thinking about a current
> use
> I have for Lucene.net. I'm building a small app with a static index
> distributed with it. Can anyone recommend a way to package the index into
> say some type of file container for inclusion in an installer package?
>
> -andy
>
>
>



RE: Lucene index file container

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
Digy,

	Yes, at least with DeflateStream and GZipStream, you would have to
close the stream, open it again, and then read forward to the appropriate
place in the uncompressed stream, incurring the overhead I made reference to
in previous emails.

	None of the other libraries that I've seen offer a seekable zip
stream.

	I've also pointed out the IMO wasted overhead in zipping/unzipping
indexes when space isn't a concern, given that the index is going to be
accessed on a read/write basis more than you are going to have to zip it for
transport at any particular time.

	To that end, Andrew doesn't have an option, as it's not really worth
it to go through the motions of trying to write a seekable zip stream
implementation (by his own admission, he doesn't have space concerns).

	What I haven't seen is any reaction to the obvious solution in the
event that space *is* an issue, just setting the compression flag on the
folder the indexes are kept in.  That will compress the files on disk, and
all the APIs remain intact.  It's an instant win.

		- Nick

	

-----Original Message-----
From: Digy [mailto:digydigy@gmail.com] 
Sent: Saturday, February 27, 2010 3:15 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Hi Nick,

> "If the libraries are offering you a read/write stream which is seekable,"
As I mentioned in my first mail, that is the problem. They are not seekable.

You have to uncompress 900M data to reach to offset 900M. 

So, To avoid to read "0 to offset" whenever a seek request is made,
you have to unzip whole file at the beginning, and use that in your app. 
But this is not what I understand from Andrew's statement 
"Does any one have experience running an index directly out of zip file?"


DIGY

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Saturday, February 27, 2010 2:09 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Digy,

	You are *always* going to incur the CPU cost because in order to get
to an offset in the ^uncompressed^ file, you have to process the 100MB file
and translate it into the 1GB file.  That cost is always incurred no matter
what.

	Now, what you do with that 1GB and what the libraries do and how
they expose it is implementation-dependent.

	If the libraries just stream the uncompressed data back to you in a
forward-only, read-only way, then it's up to the library consumer to take
care of that in some way.  This usually means keeping it in memory (in which
case, you have to worry about excessive memory consumption) or writing it to
disk (in which case, you incur I/O costs).

	If the libraries are offering you a read/write stream which is
seekable, then it becomes completely implementation-dependent.  It might
very well use temp files, which incur I/O costs, or place data in memory (or
memory mapped files), which incurs a memory (and possibly I/O) cost.  I
don't have details about the specific libraries, but that's generally what
you are looking at in terms of strategies for providing this kind of
functionality.

		- Nick

-----Original Message-----
From: Digy [mailto:digydigy@gmail.com] 
Sent: Friday, February 26, 2010 6:19 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Hi Nick,
Suppose that I have 1G file with a compressed size of 100M and I want to
read just a 4K block from offset 900M.
Considering the SharpZip Lib,DotNetZip or similars , would be the cost more
CPU and less IO? Or more CPU more IO?

DIGY

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Saturday, February 27, 2010 12:49 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Andrew,

	If you are going to unpack the index into a temp directory and then
repack the file when you are done, then you are going to instantiate a cost
on startup and on teardown of the process which is mainly I/O and CPU bound
(I/O because you have to read the zip file from disk and then write the
unpacked file from the zip to another location, and CPU bound because you
are translating the byte stream while unpacking).

	That approach doesn't do anything but add that additional I/O and
CPU overhead on startup.  The "big win" for compressing the file is to save
space on disk, or whatever medium the byte stream is being persisted to.

	If all you do is unzip the file in the beginning and zip it up at
the end, then from your app's point of view, you do a lot of extra work for
nothing.  Unless you have real disk space issues, I'd recommend against
this.

	Now, if you were to create a new Directory class which uses a
GZipStream or DeflateStream as a façade over the FileStream which writes to
disk, then you are reaping the benefits of compressing the file.  The index
will always be compressed on disk and you are realizing the gains.

	The cost of doing this, however, is more CPU time (to perform the
translation) but with a gain on less I/O operations to disk (since there are
less bytes that are being written to disk).

	Depending on how much activity you have on reading/writing to/from
the index it might or might not make an impact.  You have to measure that
yourself given your applications use of the index.

	If file size is ^truly^ a concern, have you considered just setting
the compression flag on the *folder* that contains the index files?  Any
files that are added/updated/deleted will automatically be compressed if the
flag is set on the folder, so doing it in code is busywork when the OS
automatically provides it for you (assuming you are on Windows, which is a
safe bet given you are running .NET, but not absolute, of course).

		- Nick

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 4:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>

RE: Lucene index file container

Posted by Digy <di...@gmail.com>.
Hi Nick,

> "If the libraries are offering you a read/write stream which is seekable,"
As I mentioned in my first mail, that is the problem. They are not seekable.

You have to uncompress 900M data to reach to offset 900M. 

So, To avoid to read "0 to offset" whenever a seek request is made,
you have to unzip whole file at the beginning, and use that in your app. 
But this is not what I understand from Andrew's statement 
"Does any one have experience running an index directly out of zip file?"


DIGY

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Saturday, February 27, 2010 2:09 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Digy,

	You are *always* going to incur the CPU cost because in order to get
to an offset in the ^uncompressed^ file, you have to process the 100MB file
and translate it into the 1GB file.  That cost is always incurred no matter
what.

	Now, what you do with that 1GB and what the libraries do and how
they expose it is implementation-dependent.

	If the libraries just stream the uncompressed data back to you in a
forward-only, read-only way, then it's up to the library consumer to take
care of that in some way.  This usually means keeping it in memory (in which
case, you have to worry about excessive memory consumption) or writing it to
disk (in which case, you incur I/O costs).

	If the libraries are offering you a read/write stream which is
seekable, then it becomes completely implementation-dependent.  It might
very well use temp files, which incur I/O costs, or place data in memory (or
memory mapped files), which incurs a memory (and possibly I/O) cost.  I
don't have details about the specific libraries, but that's generally what
you are looking at in terms of strategies for providing this kind of
functionality.

		- Nick

-----Original Message-----
From: Digy [mailto:digydigy@gmail.com] 
Sent: Friday, February 26, 2010 6:19 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Hi Nick,
Suppose that I have 1G file with a compressed size of 100M and I want to
read just a 4K block from offset 900M.
Considering the SharpZip Lib,DotNetZip or similars , would be the cost more
CPU and less IO? Or more CPU more IO?

DIGY

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Saturday, February 27, 2010 12:49 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Andrew,

	If you are going to unpack the index into a temp directory and then
repack the file when you are done, then you are going to instantiate a cost
on startup and on teardown of the process which is mainly I/O and CPU bound
(I/O because you have to read the zip file from disk and then write the
unpacked file from the zip to another location, and CPU bound because you
are translating the byte stream while unpacking).

	That approach doesn't do anything but add that additional I/O and
CPU overhead on startup.  The "big win" for compressing the file is to save
space on disk, or whatever medium the byte stream is being persisted to.

	If all you do is unzip the file in the beginning and zip it up at
the end, then from your app's point of view, you do a lot of extra work for
nothing.  Unless you have real disk space issues, I'd recommend against
this.

	Now, if you were to create a new Directory class which uses a
GZipStream or DeflateStream as a façade over the FileStream which writes to
disk, then you are reaping the benefits of compressing the file.  The index
will always be compressed on disk and you are realizing the gains.

	The cost of doing this, however, is more CPU time (to perform the
translation) but with a gain on less I/O operations to disk (since there are
less bytes that are being written to disk).

	Depending on how much activity you have on reading/writing to/from
the index it might or might not make an impact.  You have to measure that
yourself given your applications use of the index.

	If file size is ^truly^ a concern, have you considered just setting
the compression flag on the *folder* that contains the index files?  Any
files that are added/updated/deleted will automatically be compressed if the
flag is set on the folder, so doing it in code is busywork when the OS
automatically provides it for you (assuming you are on Windows, which is a
safe bet given you are running .NET, but not absolute, of course).

		- Nick

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 4:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>


RE: Lucene index file container

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
Digy,

	You are *always* going to incur the CPU cost because in order to get
to an offset in the ^uncompressed^ file, you have to process the 100MB file
and translate it into the 1GB file.  That cost is always incurred no matter
what.

	Now, what you do with that 1GB and what the libraries do and how
they expose it is implementation-dependent.

	If the libraries just stream the uncompressed data back to you in a
forward-only, read-only way, then it's up to the library consumer to take
care of that in some way.  This usually means keeping it in memory (in which
case, you have to worry about excessive memory consumption) or writing it to
disk (in which case, you incur I/O costs).

	If the libraries are offering you a read/write stream which is
seekable, then it becomes completely implementation-dependent.  It might
very well use temp files, which incur I/O costs, or place data in memory (or
memory mapped files), which incurs a memory (and possibly I/O) cost.  I
don't have details about the specific libraries, but that's generally what
you are looking at in terms of strategies for providing this kind of
functionality.

		- Nick

-----Original Message-----
From: Digy [mailto:digydigy@gmail.com] 
Sent: Friday, February 26, 2010 6:19 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Hi Nick,
Suppose that I have 1G file with a compressed size of 100M and I want to
read just a 4K block from offset 900M.
Considering the SharpZip Lib,DotNetZip or similars , would be the cost more
CPU and less IO? Or more CPU more IO?

DIGY

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Saturday, February 27, 2010 12:49 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Andrew,

	If you are going to unpack the index into a temp directory and then
repack the file when you are done, then you are going to instantiate a cost
on startup and on teardown of the process which is mainly I/O and CPU bound
(I/O because you have to read the zip file from disk and then write the
unpacked file from the zip to another location, and CPU bound because you
are translating the byte stream while unpacking).

	That approach doesn't do anything but add that additional I/O and
CPU overhead on startup.  The "big win" for compressing the file is to save
space on disk, or whatever medium the byte stream is being persisted to.

	If all you do is unzip the file in the beginning and zip it up at
the end, then from your app's point of view, you do a lot of extra work for
nothing.  Unless you have real disk space issues, I'd recommend against
this.

	Now, if you were to create a new Directory class which uses a
GZipStream or DeflateStream as a façade over the FileStream which writes to
disk, then you are reaping the benefits of compressing the file.  The index
will always be compressed on disk and you are realizing the gains.

	The cost of doing this, however, is more CPU time (to perform the
translation) but with a gain on less I/O operations to disk (since there are
less bytes that are being written to disk).

	Depending on how much activity you have on reading/writing to/from
the index it might or might not make an impact.  You have to measure that
yourself given your applications use of the index.

	If file size is ^truly^ a concern, have you considered just setting
the compression flag on the *folder* that contains the index files?  Any
files that are added/updated/deleted will automatically be compressed if the
flag is set on the folder, so doing it in code is busywork when the OS
automatically provides it for you (assuming you are on Windows, which is a
safe bet given you are running .NET, but not absolute, of course).

		- Nick

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 4:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>

RE: Lucene index file container

Posted by Digy <di...@gmail.com>.
Hi Nick,
Suppose that I have 1G file with a compressed size of 100M and I want to
read just a 4K block from offset 900M.
Considering the SharpZip Lib,DotNetZip or similars , would be the cost more
CPU and less IO? Or more CPU more IO?

DIGY

-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Saturday, February 27, 2010 12:49 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Andrew,

	If you are going to unpack the index into a temp directory and then
repack the file when you are done, then you are going to instantiate a cost
on startup and on teardown of the process which is mainly I/O and CPU bound
(I/O because you have to read the zip file from disk and then write the
unpacked file from the zip to another location, and CPU bound because you
are translating the byte stream while unpacking).

	That approach doesn't do anything but add that additional I/O and
CPU overhead on startup.  The "big win" for compressing the file is to save
space on disk, or whatever medium the byte stream is being persisted to.

	If all you do is unzip the file in the beginning and zip it up at
the end, then from your app's point of view, you do a lot of extra work for
nothing.  Unless you have real disk space issues, I'd recommend against
this.

	Now, if you were to create a new Directory class which uses a
GZipStream or DeflateStream as a façade over the FileStream which writes to
disk, then you are reaping the benefits of compressing the file.  The index
will always be compressed on disk and you are realizing the gains.

	The cost of doing this, however, is more CPU time (to perform the
translation) but with a gain on less I/O operations to disk (since there are
less bytes that are being written to disk).

	Depending on how much activity you have on reading/writing to/from
the index it might or might not make an impact.  You have to measure that
yourself given your applications use of the index.

	If file size is ^truly^ a concern, have you considered just setting
the compression flag on the *folder* that contains the index files?  Any
files that are added/updated/deleted will automatically be compressed if the
flag is set on the folder, so doing it in code is busywork when the OS
automatically provides it for you (assuming you are on Windows, which is a
safe bet given you are running .NET, but not absolute, of course).

		- Nick

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 4:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>


RE: Lucene index file container

Posted by Hans Merkl <hm...@hmerkl.com>.
Nick,

No problem. I didn't take the time to b a bit clearer. I didn't mean to
recompress the file. My understanding is that 
- He has to download new indices
- File size is not a concern

Based on that I would recommend to zip each index into a file, download,
uncompress, store the index somewhere and use the uncompressed index.

The other proposed solutions like having a compressed stream (I think this
will be very slow) or using SolFS (have you seen the price list?) sound
technically interesting but they seem way to complex for the task he wants
to accomplish. 

Maybe I am missing something?

Hans


-----Original Message-----
From: Nicholas Paldino [.NET/C# MVP] [mailto:casperOne@caspershouse.com] 
Sent: Friday, February 26, 2010 7:07 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Hans,

	With all due respect, that's an unqualified statement.  What is the
gain in unzipping the file and rezipping it if file size is not a concern?
All you do is incur CPU time and I/O costs in doing so with no gains
whatsoever.

	Again, if you have a need to transport the index, then you should
perform the act of placing it in a container outside the scope of your
application, as you are more than likely going to transport the index less
than you are actually going to *use* the index.

		- Nick

-----Original Message-----
From: Hans Merkl [mailto:hm@hmerkl.com] 
Sent: Friday, February 26, 2010 6:29 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Then just unzip the index after downloading and store it in the application
directory. That's the best approach IMO.

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 6:18 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for all the comments.
For what's it worth, for what I'm doing file size is not a concern. Index
performance is paramount. The index will be static, no adding or deleting,
its read only.


On Fri, Feb 26, 2010 at 2:49 PM, Nicholas Paldino [.NET/C# MVP] <
casperOne@caspershouse.com> wrote:

> Andrew,
>
>        If you are going to unpack the index into a temp directory and then
> repack the file when you are done, then you are going to instantiate a
cost
> on startup and on teardown of the process which is mainly I/O and CPU
bound
> (I/O because you have to read the zip file from disk and then write the
> unpacked file from the zip to another location, and CPU bound because you
> are translating the byte stream while unpacking).
>
>        That approach doesn't do anything but add that additional I/O and
> CPU overhead on startup.  The "big win" for compressing the file is to
save
> space on disk, or whatever medium the byte stream is being persisted to.
>
>        If all you do is unzip the file in the beginning and zip it up at
> the end, then from your app's point of view, you do a lot of extra work
for
> nothing.  Unless you have real disk space issues, I'd recommend against
> this.
>
>        Now, if you were to create a new Directory class which uses a
> GZipStream or DeflateStream as a façade over the FileStream which writes
to
> disk, then you are reaping the benefits of compressing the file.  The
index
> will always be compressed on disk and you are realizing the gains.
>
>        The cost of doing this, however, is more CPU time (to perform the
> translation) but with a gain on less I/O operations to disk (since there
> are
> less bytes that are being written to disk).
>
>        Depending on how much activity you have on reading/writing to/from
> the index it might or might not make an impact.  You have to measure that
> yourself given your applications use of the index.
>
>        If file size is ^truly^ a concern, have you considered just setting
> the compression flag on the *folder* that contains the index files?  Any
> files that are added/updated/deleted will automatically be compressed if
> the
> flag is set on the folder, so doing it in code is busywork when the OS
> automatically provides it for you (assuming you are on Windows, which is a
> safe bet given you are running .NET, but not absolute, of course).
>
>                - Nick
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 4:48 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Lucene index file container
>
> Thanks for both answers on this.
> I considered a zip file but was unsure of the associated overhead of
> unpacking file. Does any one have experience running an index directly out
> of zip file?
> Are my worries unfounded? I was just trying to leverage the experience of
> the group, but otherwise I'll just have to run some tests on my own.
>
>
>
> On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
> <np...@gmail.com>wrote:
>
> > <Can anyone recommend a way to package the index into say some type of
> file
> > container>
> >
> > If I understand correctly, it sounds like your asking for a text-book
> > implementation of an archiver, like a zip file.  If so, DotNetZip is a
> > solid
> > product, very easy to use, very fast.  Highly recommended.
> > http://www.codeplex.com/DotNetZip.
> >
> > Best,
> > Nick
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <
> andrew.schuler@gmail.com
> > >wrote:
> >
> > > Yes, that is do-able. I was just thinking it would be cleaner to wrap
> the
> > > indexes (there will be more than one) in some sort of file container.
> One
> > > of
> > > the things I'd like to do it be able to allow the user to download
> > > pre-packaged indexes and load them into the app. This would be easy
> with
> > a
> > > file than a directory of files no?
> > >
> > >
> > > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> > >
> > > > Can't you add all the files in the index directory to the installer
> > > > package?
> > > > This should be pretty straightforward.
> > > >
> > > > -----Original Message-----
> > > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > > Sent: Friday, February 26, 2010 12:16 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Lucene index file container
> > > >
> > > > The discussion about encrypting an index has me thinking about a
> > current
> > > > use
> > > > I have for Lucene.net. I'm building a small app with a static index
> > > > distributed with it. Can anyone recommend a way to package the index
> > into
> > > > say some type of file container for inclusion in an installer
> package?
> > > >
> > > > -andy
> > > >
> > > >
> > > >
> > >
> >
>




RE: Lucene index file container

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
Hans,

	With all due respect, that's an unqualified statement.  What is the
gain in unzipping the file and rezipping it if file size is not a concern?
All you do is incur CPU time and I/O costs in doing so with no gains
whatsoever.

	Again, if you have a need to transport the index, then you should
perform the act of placing it in a container outside the scope of your
application, as you are more than likely going to transport the index less
than you are actually going to *use* the index.

		- Nick

-----Original Message-----
From: Hans Merkl [mailto:hm@hmerkl.com] 
Sent: Friday, February 26, 2010 6:29 PM
To: lucene-net-user@lucene.apache.org
Subject: RE: Lucene index file container

Then just unzip the index after downloading and store it in the application
directory. That's the best approach IMO.

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 6:18 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for all the comments.
For what's it worth, for what I'm doing file size is not a concern. Index
performance is paramount. The index will be static, no adding or deleting,
its read only.


On Fri, Feb 26, 2010 at 2:49 PM, Nicholas Paldino [.NET/C# MVP] <
casperOne@caspershouse.com> wrote:

> Andrew,
>
>        If you are going to unpack the index into a temp directory and then
> repack the file when you are done, then you are going to instantiate a
cost
> on startup and on teardown of the process which is mainly I/O and CPU
bound
> (I/O because you have to read the zip file from disk and then write the
> unpacked file from the zip to another location, and CPU bound because you
> are translating the byte stream while unpacking).
>
>        That approach doesn't do anything but add that additional I/O and
> CPU overhead on startup.  The "big win" for compressing the file is to
save
> space on disk, or whatever medium the byte stream is being persisted to.
>
>        If all you do is unzip the file in the beginning and zip it up at
> the end, then from your app's point of view, you do a lot of extra work
for
> nothing.  Unless you have real disk space issues, I'd recommend against
> this.
>
>        Now, if you were to create a new Directory class which uses a
> GZipStream or DeflateStream as a façade over the FileStream which writes
to
> disk, then you are reaping the benefits of compressing the file.  The
index
> will always be compressed on disk and you are realizing the gains.
>
>        The cost of doing this, however, is more CPU time (to perform the
> translation) but with a gain on less I/O operations to disk (since there
> are
> less bytes that are being written to disk).
>
>        Depending on how much activity you have on reading/writing to/from
> the index it might or might not make an impact.  You have to measure that
> yourself given your applications use of the index.
>
>        If file size is ^truly^ a concern, have you considered just setting
> the compression flag on the *folder* that contains the index files?  Any
> files that are added/updated/deleted will automatically be compressed if
> the
> flag is set on the folder, so doing it in code is busywork when the OS
> automatically provides it for you (assuming you are on Windows, which is a
> safe bet given you are running .NET, but not absolute, of course).
>
>                - Nick
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 4:48 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Lucene index file container
>
> Thanks for both answers on this.
> I considered a zip file but was unsure of the associated overhead of
> unpacking file. Does any one have experience running an index directly out
> of zip file?
> Are my worries unfounded? I was just trying to leverage the experience of
> the group, but otherwise I'll just have to run some tests on my own.
>
>
>
> On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
> <np...@gmail.com>wrote:
>
> > <Can anyone recommend a way to package the index into say some type of
> file
> > container>
> >
> > If I understand correctly, it sounds like your asking for a text-book
> > implementation of an archiver, like a zip file.  If so, DotNetZip is a
> > solid
> > product, very easy to use, very fast.  Highly recommended.
> > http://www.codeplex.com/DotNetZip.
> >
> > Best,
> > Nick
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <
> andrew.schuler@gmail.com
> > >wrote:
> >
> > > Yes, that is do-able. I was just thinking it would be cleaner to wrap
> the
> > > indexes (there will be more than one) in some sort of file container.
> One
> > > of
> > > the things I'd like to do it be able to allow the user to download
> > > pre-packaged indexes and load them into the app. This would be easy
> with
> > a
> > > file than a directory of files no?
> > >
> > >
> > > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> > >
> > > > Can't you add all the files in the index directory to the installer
> > > > package?
> > > > This should be pretty straightforward.
> > > >
> > > > -----Original Message-----
> > > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > > Sent: Friday, February 26, 2010 12:16 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Lucene index file container
> > > >
> > > > The discussion about encrypting an index has me thinking about a
> > current
> > > > use
> > > > I have for Lucene.net. I'm building a small app with a static index
> > > > distributed with it. Can anyone recommend a way to package the index
> > into
> > > > say some type of file container for inclusion in an installer
> package?
> > > >
> > > > -andy
> > > >
> > > >
> > > >
> > >
> >
>


RE: Lucene index file container

Posted by Hans Merkl <hm...@hmerkl.com>.
Then just unzip the index after downloading and store it in the application
directory. That's the best approach IMO.

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 6:18 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for all the comments.
For what's it worth, for what I'm doing file size is not a concern. Index
performance is paramount. The index will be static, no adding or deleting,
its read only.


On Fri, Feb 26, 2010 at 2:49 PM, Nicholas Paldino [.NET/C# MVP] <
casperOne@caspershouse.com> wrote:

> Andrew,
>
>        If you are going to unpack the index into a temp directory and then
> repack the file when you are done, then you are going to instantiate a
cost
> on startup and on teardown of the process which is mainly I/O and CPU
bound
> (I/O because you have to read the zip file from disk and then write the
> unpacked file from the zip to another location, and CPU bound because you
> are translating the byte stream while unpacking).
>
>        That approach doesn't do anything but add that additional I/O and
> CPU overhead on startup.  The "big win" for compressing the file is to
save
> space on disk, or whatever medium the byte stream is being persisted to.
>
>        If all you do is unzip the file in the beginning and zip it up at
> the end, then from your app's point of view, you do a lot of extra work
for
> nothing.  Unless you have real disk space issues, I'd recommend against
> this.
>
>        Now, if you were to create a new Directory class which uses a
> GZipStream or DeflateStream as a façade over the FileStream which writes
to
> disk, then you are reaping the benefits of compressing the file.  The
index
> will always be compressed on disk and you are realizing the gains.
>
>        The cost of doing this, however, is more CPU time (to perform the
> translation) but with a gain on less I/O operations to disk (since there
> are
> less bytes that are being written to disk).
>
>        Depending on how much activity you have on reading/writing to/from
> the index it might or might not make an impact.  You have to measure that
> yourself given your applications use of the index.
>
>        If file size is ^truly^ a concern, have you considered just setting
> the compression flag on the *folder* that contains the index files?  Any
> files that are added/updated/deleted will automatically be compressed if
> the
> flag is set on the folder, so doing it in code is busywork when the OS
> automatically provides it for you (assuming you are on Windows, which is a
> safe bet given you are running .NET, but not absolute, of course).
>
>                - Nick
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 4:48 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Lucene index file container
>
> Thanks for both answers on this.
> I considered a zip file but was unsure of the associated overhead of
> unpacking file. Does any one have experience running an index directly out
> of zip file?
> Are my worries unfounded? I was just trying to leverage the experience of
> the group, but otherwise I'll just have to run some tests on my own.
>
>
>
> On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
> <np...@gmail.com>wrote:
>
> > <Can anyone recommend a way to package the index into say some type of
> file
> > container>
> >
> > If I understand correctly, it sounds like your asking for a text-book
> > implementation of an archiver, like a zip file.  If so, DotNetZip is a
> > solid
> > product, very easy to use, very fast.  Highly recommended.
> > http://www.codeplex.com/DotNetZip.
> >
> > Best,
> > Nick
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <
> andrew.schuler@gmail.com
> > >wrote:
> >
> > > Yes, that is do-able. I was just thinking it would be cleaner to wrap
> the
> > > indexes (there will be more than one) in some sort of file container.
> One
> > > of
> > > the things I'd like to do it be able to allow the user to download
> > > pre-packaged indexes and load them into the app. This would be easy
> with
> > a
> > > file than a directory of files no?
> > >
> > >
> > > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> > >
> > > > Can't you add all the files in the index directory to the installer
> > > > package?
> > > > This should be pretty straightforward.
> > > >
> > > > -----Original Message-----
> > > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > > Sent: Friday, February 26, 2010 12:16 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Lucene index file container
> > > >
> > > > The discussion about encrypting an index has me thinking about a
> > current
> > > > use
> > > > I have for Lucene.net. I'm building a small app with a static index
> > > > distributed with it. Can anyone recommend a way to package the index
> > into
> > > > say some type of file container for inclusion in an installer
> package?
> > > >
> > > > -andy
> > > >
> > > >
> > > >
> > >
> >
>



RE: Lucene index file container

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
Andrew,

	If that's the case, then you shouldn't be considering compressing
the index, it's just going to add overhead which you don't need.

		- Nick

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 6:18 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for all the comments.
For what's it worth, for what I'm doing file size is not a concern. Index
performance is paramount. The index will be static, no adding or deleting,
its read only.


On Fri, Feb 26, 2010 at 2:49 PM, Nicholas Paldino [.NET/C# MVP] <
casperOne@caspershouse.com> wrote:

> Andrew,
>
>        If you are going to unpack the index into a temp directory and then
> repack the file when you are done, then you are going to instantiate a
cost
> on startup and on teardown of the process which is mainly I/O and CPU
bound
> (I/O because you have to read the zip file from disk and then write the
> unpacked file from the zip to another location, and CPU bound because you
> are translating the byte stream while unpacking).
>
>        That approach doesn't do anything but add that additional I/O and
> CPU overhead on startup.  The "big win" for compressing the file is to
save
> space on disk, or whatever medium the byte stream is being persisted to.
>
>        If all you do is unzip the file in the beginning and zip it up at
> the end, then from your app's point of view, you do a lot of extra work
for
> nothing.  Unless you have real disk space issues, I'd recommend against
> this.
>
>        Now, if you were to create a new Directory class which uses a
> GZipStream or DeflateStream as a façade over the FileStream which writes
to
> disk, then you are reaping the benefits of compressing the file.  The
index
> will always be compressed on disk and you are realizing the gains.
>
>        The cost of doing this, however, is more CPU time (to perform the
> translation) but with a gain on less I/O operations to disk (since there
> are
> less bytes that are being written to disk).
>
>        Depending on how much activity you have on reading/writing to/from
> the index it might or might not make an impact.  You have to measure that
> yourself given your applications use of the index.
>
>        If file size is ^truly^ a concern, have you considered just setting
> the compression flag on the *folder* that contains the index files?  Any
> files that are added/updated/deleted will automatically be compressed if
> the
> flag is set on the folder, so doing it in code is busywork when the OS
> automatically provides it for you (assuming you are on Windows, which is a
> safe bet given you are running .NET, but not absolute, of course).
>
>                - Nick
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 4:48 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Lucene index file container
>
> Thanks for both answers on this.
> I considered a zip file but was unsure of the associated overhead of
> unpacking file. Does any one have experience running an index directly out
> of zip file?
> Are my worries unfounded? I was just trying to leverage the experience of
> the group, but otherwise I'll just have to run some tests on my own.
>
>
>
> On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
> <np...@gmail.com>wrote:
>
> > <Can anyone recommend a way to package the index into say some type of
> file
> > container>
> >
> > If I understand correctly, it sounds like your asking for a text-book
> > implementation of an archiver, like a zip file.  If so, DotNetZip is a
> > solid
> > product, very easy to use, very fast.  Highly recommended.
> > http://www.codeplex.com/DotNetZip.
> >
> > Best,
> > Nick
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <
> andrew.schuler@gmail.com
> > >wrote:
> >
> > > Yes, that is do-able. I was just thinking it would be cleaner to wrap
> the
> > > indexes (there will be more than one) in some sort of file container.
> One
> > > of
> > > the things I'd like to do it be able to allow the user to download
> > > pre-packaged indexes and load them into the app. This would be easy
> with
> > a
> > > file than a directory of files no?
> > >
> > >
> > > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> > >
> > > > Can't you add all the files in the index directory to the installer
> > > > package?
> > > > This should be pretty straightforward.
> > > >
> > > > -----Original Message-----
> > > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > > Sent: Friday, February 26, 2010 12:16 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Lucene index file container
> > > >
> > > > The discussion about encrypting an index has me thinking about a
> > current
> > > > use
> > > > I have for Lucene.net. I'm building a small app with a static index
> > > > distributed with it. Can anyone recommend a way to package the index
> > into
> > > > say some type of file container for inclusion in an installer
> package?
> > > >
> > > > -andy
> > > >
> > > >
> > > >
> > >
> >
>

Re: Lucene index file container

Posted by Andrew Schuler <an...@gmail.com>.
Thanks for all the comments.
For what's it worth, for what I'm doing file size is not a concern. Index
performance is paramount. The index will be static, no adding or deleting,
its read only.


On Fri, Feb 26, 2010 at 2:49 PM, Nicholas Paldino [.NET/C# MVP] <
casperOne@caspershouse.com> wrote:

> Andrew,
>
>        If you are going to unpack the index into a temp directory and then
> repack the file when you are done, then you are going to instantiate a cost
> on startup and on teardown of the process which is mainly I/O and CPU bound
> (I/O because you have to read the zip file from disk and then write the
> unpacked file from the zip to another location, and CPU bound because you
> are translating the byte stream while unpacking).
>
>        That approach doesn't do anything but add that additional I/O and
> CPU overhead on startup.  The "big win" for compressing the file is to save
> space on disk, or whatever medium the byte stream is being persisted to.
>
>        If all you do is unzip the file in the beginning and zip it up at
> the end, then from your app's point of view, you do a lot of extra work for
> nothing.  Unless you have real disk space issues, I'd recommend against
> this.
>
>        Now, if you were to create a new Directory class which uses a
> GZipStream or DeflateStream as a façade over the FileStream which writes to
> disk, then you are reaping the benefits of compressing the file.  The index
> will always be compressed on disk and you are realizing the gains.
>
>        The cost of doing this, however, is more CPU time (to perform the
> translation) but with a gain on less I/O operations to disk (since there
> are
> less bytes that are being written to disk).
>
>        Depending on how much activity you have on reading/writing to/from
> the index it might or might not make an impact.  You have to measure that
> yourself given your applications use of the index.
>
>        If file size is ^truly^ a concern, have you considered just setting
> the compression flag on the *folder* that contains the index files?  Any
> files that are added/updated/deleted will automatically be compressed if
> the
> flag is set on the folder, so doing it in code is busywork when the OS
> automatically provides it for you (assuming you are on Windows, which is a
> safe bet given you are running .NET, but not absolute, of course).
>
>                - Nick
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 4:48 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Lucene index file container
>
> Thanks for both answers on this.
> I considered a zip file but was unsure of the associated overhead of
> unpacking file. Does any one have experience running an index directly out
> of zip file?
> Are my worries unfounded? I was just trying to leverage the experience of
> the group, but otherwise I'll just have to run some tests on my own.
>
>
>
> On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
> <np...@gmail.com>wrote:
>
> > <Can anyone recommend a way to package the index into say some type of
> file
> > container>
> >
> > If I understand correctly, it sounds like your asking for a text-book
> > implementation of an archiver, like a zip file.  If so, DotNetZip is a
> > solid
> > product, very easy to use, very fast.  Highly recommended.
> > http://www.codeplex.com/DotNetZip.
> >
> > Best,
> > Nick
> >
> >
> >
> > On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <
> andrew.schuler@gmail.com
> > >wrote:
> >
> > > Yes, that is do-able. I was just thinking it would be cleaner to wrap
> the
> > > indexes (there will be more than one) in some sort of file container.
> One
> > > of
> > > the things I'd like to do it be able to allow the user to download
> > > pre-packaged indexes and load them into the app. This would be easy
> with
> > a
> > > file than a directory of files no?
> > >
> > >
> > > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> > >
> > > > Can't you add all the files in the index directory to the installer
> > > > package?
> > > > This should be pretty straightforward.
> > > >
> > > > -----Original Message-----
> > > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > > Sent: Friday, February 26, 2010 12:16 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Lucene index file container
> > > >
> > > > The discussion about encrypting an index has me thinking about a
> > current
> > > > use
> > > > I have for Lucene.net. I'm building a small app with a static index
> > > > distributed with it. Can anyone recommend a way to package the index
> > into
> > > > say some type of file container for inclusion in an installer
> package?
> > > >
> > > > -andy
> > > >
> > > >
> > > >
> > >
> >
>

RE: Lucene index file container

Posted by "Nicholas Paldino [.NET/C# MVP]" <ca...@caspershouse.com>.
Andrew,

	If you are going to unpack the index into a temp directory and then
repack the file when you are done, then you are going to instantiate a cost
on startup and on teardown of the process which is mainly I/O and CPU bound
(I/O because you have to read the zip file from disk and then write the
unpacked file from the zip to another location, and CPU bound because you
are translating the byte stream while unpacking).

	That approach doesn't do anything but add that additional I/O and
CPU overhead on startup.  The "big win" for compressing the file is to save
space on disk, or whatever medium the byte stream is being persisted to.

	If all you do is unzip the file in the beginning and zip it up at
the end, then from your app's point of view, you do a lot of extra work for
nothing.  Unless you have real disk space issues, I'd recommend against
this.

	Now, if you were to create a new Directory class which uses a
GZipStream or DeflateStream as a façade over the FileStream which writes to
disk, then you are reaping the benefits of compressing the file.  The index
will always be compressed on disk and you are realizing the gains.

	The cost of doing this, however, is more CPU time (to perform the
translation) but with a gain on less I/O operations to disk (since there are
less bytes that are being written to disk).

	Depending on how much activity you have on reading/writing to/from
the index it might or might not make an impact.  You have to measure that
yourself given your applications use of the index.

	If file size is ^truly^ a concern, have you considered just setting
the compression flag on the *folder* that contains the index files?  Any
files that are added/updated/deleted will automatically be compressed if the
flag is set on the folder, so doing it in code is busywork when the OS
automatically provides it for you (assuming you are on Windows, which is a
safe bet given you are running .NET, but not absolute, of course).

		- Nick

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 4:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>

RE: Lucene index file container

Posted by Digy <di...@gmail.com>.
I don't think that you can find an easy (and performant) way for running an
index directly out of zip file.
First of all, it is not easy -if not impossible- to "seek" to an offset in a
zipped file without reading all the bytes from beginning (resulting in a
very very bad performance).
Second, Lucene.Net does not have a built-in support for zipped index. So,
you have to develop your own "ZippedDirectory" class.

DIGY

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 11:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of
file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap
the
> > indexes (there will be more than one) in some sort of file container.
One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>


RE: Lucene index file container

Posted by Franklin Simmons <fs...@sccmediaserver.com>.
Consider using SharpZipLib, chock-full of open source goodness.  Alternatively, with a little forethought you could roll your own zipper by implementing System.IO.Compression.GZipStream - there are several examples on the net, just fire up your favorite search engine and poke around.


-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 4:48 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap the
> > indexes (there will be more than one) in some sort of file container. One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>

Re: Lucene index file container

Posted by Andrew Schuler <an...@gmail.com>.
Thanks for both answers on this.
I considered a zip file but was unsure of the associated overhead of
unpacking file. Does any one have experience running an index directly out
of zip file?
Are my worries unfounded? I was just trying to leverage the experience of
the group, but otherwise I'll just have to run some tests on my own.



On Fri, Feb 26, 2010 at 11:55 AM, Nicholas Petersen
<np...@gmail.com>wrote:

> <Can anyone recommend a way to package the index into say some type of file
> container>
>
> If I understand correctly, it sounds like your asking for a text-book
> implementation of an archiver, like a zip file.  If so, DotNetZip is a
> solid
> product, very easy to use, very fast.  Highly recommended.
> http://www.codeplex.com/DotNetZip.
>
> Best,
> Nick
>
>
>
> On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <andrew.schuler@gmail.com
> >wrote:
>
> > Yes, that is do-able. I was just thinking it would be cleaner to wrap the
> > indexes (there will be more than one) in some sort of file container. One
> > of
> > the things I'd like to do it be able to allow the user to download
> > pre-packaged indexes and load them into the app. This would be easy with
> a
> > file than a directory of files no?
> >
> >
> > On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
> >
> > > Can't you add all the files in the index directory to the installer
> > > package?
> > > This should be pretty straightforward.
> > >
> > > -----Original Message-----
> > > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > > Sent: Friday, February 26, 2010 12:16 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Lucene index file container
> > >
> > > The discussion about encrypting an index has me thinking about a
> current
> > > use
> > > I have for Lucene.net. I'm building a small app with a static index
> > > distributed with it. Can anyone recommend a way to package the index
> into
> > > say some type of file container for inclusion in an installer package?
> > >
> > > -andy
> > >
> > >
> > >
> >
>

Re: Lucene index file container

Posted by Nicholas Petersen <np...@gmail.com>.
<Can anyone recommend a way to package the index into say some type of file
container>

If I understand correctly, it sounds like your asking for a text-book
implementation of an archiver, like a zip file.  If so, DotNetZip is a solid
product, very easy to use, very fast.  Highly recommended.
http://www.codeplex.com/DotNetZip.

Best,
Nick



On Fri, Feb 26, 2010 at 2:47 PM, Andrew Schuler <an...@gmail.com>wrote:

> Yes, that is do-able. I was just thinking it would be cleaner to wrap the
> indexes (there will be more than one) in some sort of file container. One
> of
> the things I'd like to do it be able to allow the user to download
> pre-packaged indexes and load them into the app. This would be easy with a
> file than a directory of files no?
>
>
> On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
>
> > Can't you add all the files in the index directory to the installer
> > package?
> > This should be pretty straightforward.
> >
> > -----Original Message-----
> > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > Sent: Friday, February 26, 2010 12:16 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Lucene index file container
> >
> > The discussion about encrypting an index has me thinking about a current
> > use
> > I have for Lucene.net. I'm building a small app with a static index
> > distributed with it. Can anyone recommend a way to package the index into
> > say some type of file container for inclusion in an installer package?
> >
> > -andy
> >
> >
> >
>

Re: Lucene index file container

Posted by Andrew Schuler <an...@gmail.com>.
SoIFS looks very interesting, I'll have to investigate further.
Thanks everyone for your comments, this has been an interesting discussion.
At least it has shown me that I'm not overlooking something obvious.
-andy



On Sat, Feb 27, 2010 at 2:47 AM, Digy <di...@gmail.com> wrote:

> Hi Andrew,
> I think you are looking for a library like SolFS. But it is not free. If
> you
> decide to use such a library then you have to implement also Lucene's
> Directory class to make the FS structure transparent to lucene.
>
> You can also use the source of RAMDirectory as a sample to develop your own
> storage on disk.
>
> DIGY
>
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 9:47 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: Lucene index file container
>
> Yes, that is do-able. I was just thinking it would be cleaner to wrap the
> indexes (there will be more than one) in some sort of file container. One
> of
> the things I'd like to do it be able to allow the user to download
> pre-packaged indexes and load them into the app. This would be easy with a
> file than a directory of files no?
>
>
> On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:
>
> > Can't you add all the files in the index directory to the installer
> > package?
> > This should be pretty straightforward.
> >
> > -----Original Message-----
> > From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> > Sent: Friday, February 26, 2010 12:16 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Lucene index file container
> >
> > The discussion about encrypting an index has me thinking about a current
> > use
> > I have for Lucene.net. I'm building a small app with a static index
> > distributed with it. Can anyone recommend a way to package the index into
> > say some type of file container for inclusion in an installer package?
> >
> > -andy
> >
> >
> >
>
>

RE: Lucene index file container

Posted by Digy <di...@gmail.com>.
Hi Andrew,
I think you are looking for a library like SolFS. But it is not free. If you
decide to use such a library then you have to implement also Lucene's
Directory class to make the FS structure transparent to lucene.

You can also use the source of RAMDirectory as a sample to develop your own
storage on disk.

DIGY


-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 9:47 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: Lucene index file container

Yes, that is do-able. I was just thinking it would be cleaner to wrap the
indexes (there will be more than one) in some sort of file container. One of
the things I'd like to do it be able to allow the user to download
pre-packaged indexes and load them into the app. This would be easy with a
file than a directory of files no?


On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:

> Can't you add all the files in the index directory to the installer
> package?
> This should be pretty straightforward.
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 12:16 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Lucene index file container
>
> The discussion about encrypting an index has me thinking about a current
> use
> I have for Lucene.net. I'm building a small app with a static index
> distributed with it. Can anyone recommend a way to package the index into
> say some type of file container for inclusion in an installer package?
>
> -andy
>
>
>


Re: Lucene index file container

Posted by Andrew Schuler <an...@gmail.com>.
Yes, that is do-able. I was just thinking it would be cleaner to wrap the
indexes (there will be more than one) in some sort of file container. One of
the things I'd like to do it be able to allow the user to download
pre-packaged indexes and load them into the app. This would be easy with a
file than a directory of files no?


On Fri, Feb 26, 2010 at 11:41 AM, Hans Merkl <hm...@hmerkl.com> wrote:

> Can't you add all the files in the index directory to the installer
> package?
> This should be pretty straightforward.
>
> -----Original Message-----
> From: Andrew Schuler [mailto:andrew.schuler@gmail.com]
> Sent: Friday, February 26, 2010 12:16 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Lucene index file container
>
> The discussion about encrypting an index has me thinking about a current
> use
> I have for Lucene.net. I'm building a small app with a static index
> distributed with it. Can anyone recommend a way to package the index into
> say some type of file container for inclusion in an installer package?
>
> -andy
>
>
>

RE: Lucene index file container

Posted by Hans Merkl <hm...@hmerkl.com>.
Can't you add all the files in the index directory to the installer package?
This should be pretty straightforward.

-----Original Message-----
From: Andrew Schuler [mailto:andrew.schuler@gmail.com] 
Sent: Friday, February 26, 2010 12:16 PM
To: lucene-net-user@lucene.apache.org
Subject: Lucene index file container

The discussion about encrypting an index has me thinking about a current use
I have for Lucene.net. I'm building a small app with a static index
distributed with it. Can anyone recommend a way to package the index into
say some type of file container for inclusion in an installer package?

-andy