You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-dev@lucene.apache.org by Jason Rutherglen <ja...@gmail.com> on 2009/08/11 01:30:53 UTC

Solr document server/component

It would be useful to enable a Solr document server (running
Cassandra, HBase, or Voldemort) to be transparently integrated
into a Solr search cluster. Meaning if I do a document update,
Solr underneath sends the document data to the dedicated
document server cluster. When performing a query, the results
would automatically include documents from the document servers.
This would fairly seamlessly separate the stored fields from the
indexed fields. Has there been work along these lines yet?

Re: Solr document server/component

Posted by Jason Rutherglen <ja...@gmail.com>.

> So ,when the stored fields are fetched from the
"DocumentService" the version number also is sent?

Yes.

You're probably wondering how we manage the older versions of an
object (aka id + version) in the DocService. We'd probably
delete the object as it's deleted from the index?

I use the word "object" to denote something different than a
document which can be short lived. An object exists over multiple
updates.

We can start with the interface, and implement a default
implementation using Apache licensed Voldemort
http://project-voldemort.com ?

2009/8/11 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
>> This is why we'd want to use optimistic concurrency, meaning
>> versions of ids encoded in the documents (which would also be
>> stored in the index ala GData).
>
> So ,when the stored fields are fetched from the "DocumentService" the
> version number also is sent?
>
> As Grant mentioned , it would be nice to have the interface ready and
> make it possible to have multiple implementations
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: Solr document server/component

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.

> This is why we'd want to use optimistic concurrency, meaning
> versions of ids encoded in the documents (which would also be
> stored in the index ala GData).

So ,when the stored fields are fetched from the "DocumentService" the
version number also is sent?

As Grant mentioned , it would be nice to have the interface ready and
make it possible to have multiple implementations

-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr document server/component

Posted by Jason Rutherglen <ja...@gmail.com>.

The ids would be stored in Solr, when a query is executed the
ids would be looked up from the DocumentService interface which
underneath would obtain the document data (aka stored fields).
Solr would then return the results combining the two, and
present the stored fields as it does today.

For updating something similar would occur, where the stored
fields are shuffled off to the DocumentService update method.

> how are we going to deal with this mismatch?

This is why we'd want to use optimistic concurrency, meaning
versions of ids encoded in the documents (which would also be
stored in the index ala GData).

2009/8/11 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> the point is that the search will be done on committed docs and the
> stored fields will show uncommitted data.how are we going to deal with
> this mismatch?
>
> On Tue, Aug 11, 2009 at 6:50 AM, Grant Ingersoll<gs...@apache.org> wrote:
>>
>>
>> On Aug 10, 2009, at 7:30 PM, Jason Rutherglen wrote:
>>
>>> It would be useful to enable a Solr document server (running
>>> Cassandra, HBase, or Voldemort) to be transparently integrated
>>> into a Solr search cluster. Meaning if I do a document update,
>>> Solr underneath sends the document data to the dedicated
>>> document server cluster. When performing a query, the results
>>> would automatically include documents from the document servers.
>>> This would fairly seamlessly separate the stored fields from the
>>> indexed fields. Has there been work along these lines yet?
>>
>> I don't think any work has been done on it, but you are correct.  I'd add in
>> memcached and even, gasp, a DB, as well...  Of course, the more important
>> thing is the interface.
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: Solr document server/component

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.

the point is that the search will be done on committed docs and the
stored fields will show uncommitted data.how are we going to deal with
this mismatch?

On Tue, Aug 11, 2009 at 6:50 AM, Grant Ingersoll<gs...@apache.org> wrote:
>
>
> On Aug 10, 2009, at 7:30 PM, Jason Rutherglen wrote:
>
>> It would be useful to enable a Solr document server (running
>> Cassandra, HBase, or Voldemort) to be transparently integrated
>> into a Solr search cluster. Meaning if I do a document update,
>> Solr underneath sends the document data to the dedicated
>> document server cluster. When performing a query, the results
>> would automatically include documents from the document servers.
>> This would fairly seamlessly separate the stored fields from the
>> indexed fields. Has there been work along these lines yet?
>
> I don't think any work has been done on it, but you are correct.  I'd add in
> memcached and even, gasp, a DB, as well...  Of course, the more important
> thing is the interface.
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr document server/component

Posted by Grant Ingersoll <gs...@apache.org>.

On Aug 10, 2009, at 7:30 PM, Jason Rutherglen wrote:

> It would be useful to enable a Solr document server (running
> Cassandra, HBase, or Voldemort) to be transparently integrated
> into a Solr search cluster. Meaning if I do a document update,
> Solr underneath sends the document data to the dedicated
> document server cluster. When performing a query, the results
> would automatically include documents from the document servers.
> This would fairly seamlessly separate the stored fields from the
> indexed fields. Has there been work along these lines yet?

I don't think any work has been done on it, but you are correct.  I'd  
add in memcached and even, gasp, a DB, as well...  Of course, the more  
important thing is the interface.