You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Charles Lloyd <ch...@mac.com> on 2005/11/13 01:56:35 UTC

Shared Field Values

Would the following be a reasonable feature to add to Lucene?

We use Lucene for a catalog with about 3 million items, each document represents one item.  Some Fields are highly redundant, such as "Manufacturer Name"; we only have a few hundred different manufacturers.  I would like to be able declare this Field to be Indexed, Stored and 'Shared' so that there's only one copy of each Name stored.  This would save about 45% of the space in our catalog.

I considered using a code for each Name, but then we can't search on the Name.  So I considered using an pair of Unstored and Stored fields with a code, but this becomes unwieldy since we have many fields for which this could be done and this breaks alot of existing code.  I considered several other things as well, but think the best solution is a "Shared" Field type.

I am new to Lucene dev so I would appreciate it if someone could outline how to approach this.  It looks like FieldsWriter.addDocument(...) is a good place to make the substitution, but I've got no idea where to store the actual values.  We'd need a segment of char[] data stored somewhere that could be accessed later when FieldsReader.doc(...) is called.  Each shared Field would need to write out the offset and length rather than the value itself.

What would be the best way to store the shared data?

Charles.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shared Field Values

Posted by Charles Lloyd <ch...@mac.com>.
On Saturday, November 12, 2005, at 05:37PM, Chris Hostetter <ho...@fucit.org> wrote:
>if the fields you are talking
>about "sharing" are allways indexed, then you can leave them UnStored, and
>use a FieldCache.StringIndex to get the values.

This is a great suggestion except that the terms stored are all lower-cased --  I need to preserve case.

Charles.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Shared Field Values

Posted by Chris Hostetter <ho...@fucit.org>.
The first thing that occurs to me, is that if the fields you are talking
about "sharing" are allways indexed, then you can leave them UnStored, and
use a FieldCache.StringIndex to get the values.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org