You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Yu <RO...@morningstar.com> on 2011/09/30 03:54:20 UTC

split index horizontally

Is there a efficient way to handle my case?

Each document has several group fields, some of them are updated
frequently, some of them are updated infrequently. Is it possible to
maintain index based on groups but can search over all of them as ONE
index?

 

To some extent, it is a three layer of document (I think the current is
two layer):

document = {key: groups},...

groups = {group-name: fields},...

fields = {field-name: field-value},...

 

we can maintain index for each group, and can search it like below:

               query: group-name-1:field-1:val-1 AND (
group-name-2:field-2:val-2 OR group-name-3:field-3:[min-3 TO max-3])

               return data:
group-name-1:field-1,field-2;groupd-name-2:field-3,field-4,...

 

Thanks,

Robert Yu

 


RE: split index horizontally (updated, a special join operation?)

Posted by Robert Yu <RO...@morningstar.com>.
I think we can treat this as a special join operation.
Here are my some clues to support it.
1, build each group as a separate index
	Index 1's name group1
		Key
		Group 1's fields
	Index 2's name group2
		Key
		Group 2's fields
2, query looks like a RDBMS's join operation. For example
	select g1.key, g.field1, g2.field1 from group1 g1, group2 g2 where g1.field1 > 1 AND (g1.field2 < 100 OR g2.field1 < 99).
3, how Solr/Lucene support the above query?
	It looks like they do not support it.

I've two ideas of its solution.
First, is it possible to use the same docid for the same key in all indexes? If so, what we need do is to have a global docid generator which generate the same docid for the same key, and Hit contains index information (maybe like Segment).

	I reviewed the source code of Lucene/Solr and found it seems docid are internally generated during building index, more important, some operation depends on its order. In another words, you can not give an document an smaller docid. Am I right?

Second, let score merge result by key rather than by docid. Of course, it is not efficient as by docid. Since Lucene had build index, I think it should still be fast enough.
	I'd like to hear your opinions on this topic.

Thanks,
-----Original Message-----
From: Robert Yu [mailto:ROBERT.YU@morningstar.com] 
Sent: Friday, September 30, 2011 9:54 AM
To: solr-user@lucene.apache.org
Subject: split index horizontally

Is there a efficient way to handle my case?

Each document has several group fields, some of them are updated frequently, some of them are updated infrequently. Is it possible to maintain index based on groups but can search over all of them as ONE index?

 

To some extent, it is a three layer of document (I think the current is two layer):

document = {key: groups},...

groups = {group-name: fields},...

fields = {field-name: field-value},...

 

we can maintain index for each group, and can search it like below:

               query: group-name-1:field-1:val-1 AND (
group-name-2:field-2:val-2 OR group-name-3:field-3:[min-3 TO max-3])

               return data:
group-name-1:field-1,field-2;groupd-name-2:field-3,field-4,...

 

Thanks,

Robert Yu