You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Artem Chereisky <a....@gmail.com> on 2010/03/12 00:28:31 UTC
indexing design pattern question
Hi,
I'm looking for "best practices" advise on index building
I have two types of documents I need to index, let's say type A and B.
I have 2 million documents of type A and 10,000 of type B
I can see 3 options when it comes to building my index:
Option 1.
Add both document types to one index, without adding a document type field.
As long as field names are unique, there are no problems. When I search for
fields from doc type A, I get the right answers, same for type B
Option 2.
Add both document types to one index, but add a document type field. Then
specify document type field to each query (or have a filter on document
type).
Option 3.
Build 2 indexes, one for each document type. Have two IndexSearchers, one
for each index.
So, which is the best option, from both performance and design perspectives?
Regards,
Art
RE: indexing design pattern question
Posted by Michael Garski <mg...@myspace-inc.com>.
Artem,
My preference would be either #2 or #3, as over time you may have some
overlap on the fields if your data changes. #2 gives you the advantage
of being able to add new data types as your application changes over
time while #3 would require to spin up a new index.
As to separate them or not, it depends :) If one of the data types is
more volatile and requires a lot of updates than the other you may want
to separate them.
Michael
-----Original Message-----
From: Artem Chereisky [mailto:a.chereisky@gmail.com]
Sent: Thursday, March 11, 2010 3:29 PM
To: lucene-net-user@lucene.apache.org
Subject: indexing design pattern question
Hi,
I'm looking for "best practices" advise on index building
I have two types of documents I need to index, let's say type A and B.
I have 2 million documents of type A and 10,000 of type B
I can see 3 options when it comes to building my index:
Option 1.
Add both document types to one index, without adding a document type
field.
As long as field names are unique, there are no problems. When I search
for
fields from doc type A, I get the right answers, same for type B
Option 2.
Add both document types to one index, but add a document type field.
Then
specify document type field to each query (or have a filter on document
type).
Option 3.
Build 2 indexes, one for each document type. Have two IndexSearchers,
one
for each index.
So, which is the best option, from both performance and design
perspectives?
Regards,
Art