You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ryan Aslett <Ry...@Qsent.com> on 2004/05/12 03:39:11 UTC

Fields

How much of a performance benefit/impact does "fielding" your data have
in Lucene?

Lets say I have 100 million documents.  I have Name, Phone, and Address
for each document.

I could either index the terms in separate fields, like 
Field.Text("Name","Bob Jones");
Field.Keyword("Phone","5551212");
Field.Text("Address","123 Main");

Or, I could make everything in the same field, prepending a field
designator to the term itself as keywords, like:
Field.Keyword("Universal","nmBob");
Field.Keyword("Universal","nmJones");
Field.Keyword("Universal","ph5551212");
Field.Keyword("Universal","ad123");
Field.Keyword("Universal","adMain");

And when I build my queries always seach the same field, and prepend the
"fieldcode" to the search term.

Lets also assume that these universal fields are only indexed and not
stored, and I store something completely different as the actual stored
data.

Assumptions: 
*Indexing/Preprocessing speed isnt important, unless its orders of
magnitude slower.
*10 indexes of 10 million Documents each.

Does anybody have any ideas as to the impact on query performance with
this method? Pros/Cons?

A commercial product that we are using is much slower when "fielding"
data, and has the concept of "unfielded literals". This second method is
how we currently field data and it seems to give us a tremendous
performance boost. Im curious if Lucene works in a similar fashion...

Ryan Aslett

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org