You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Artem Chereisky <a....@gmail.com> on 2010/03/12 00:28:31 UTC

indexing design pattern question

Hi,

I'm looking for "best practices" advise on index building

I have two types of documents I need to index, let's say type A and B.
I have 2 million documents of type A and 10,000 of type B
I can see 3 options when it comes to building my index:

Option 1.
Add both document types to one index, without adding a document type field.
As long as field names are unique, there are no problems. When I search for
fields from doc type A, I get the right answers, same for type B

Option 2.
Add both document types to one index, but add a document type field. Then
specify document type field to each query (or have a filter on document
type).

Option 3.
Build 2 indexes, one for each document type. Have two IndexSearchers, one
for each index.

So, which is the best option, from both performance and design perspectives?

Regards,
Art

RE: indexing design pattern question

Posted by Michael Garski <mg...@myspace-inc.com>.
Artem,

My preference would be either #2 or #3, as over time you may have some
overlap on the fields if your data changes.  #2 gives you the advantage
of being able to add new data types as your application changes over
time while #3 would require to spin up a new index.

As to separate them or not, it depends :)  If one of the data types is
more volatile and requires a lot of updates than the other you may want
to separate them. 

Michael 

-----Original Message-----
From: Artem Chereisky [mailto:a.chereisky@gmail.com] 
Sent: Thursday, March 11, 2010 3:29 PM
To: lucene-net-user@lucene.apache.org
Subject: indexing design pattern question

Hi,

I'm looking for "best practices" advise on index building

I have two types of documents I need to index, let's say type A and B.
I have 2 million documents of type A and 10,000 of type B
I can see 3 options when it comes to building my index:

Option 1.
Add both document types to one index, without adding a document type
field.
As long as field names are unique, there are no problems. When I search
for
fields from doc type A, I get the right answers, same for type B

Option 2.
Add both document types to one index, but add a document type field.
Then
specify document type field to each query (or have a filter on document
type).

Option 3.
Build 2 indexes, one for each document type. Have two IndexSearchers,
one
for each index.

So, which is the best option, from both performance and design
perspectives?

Regards,
Art