You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jay Luker <lb...@reallywow.com> on 2011/01/28 19:29:03 UTC

Sending binary data as part of a query

Hi all,

Here is what I am interested in doing: I would like to send a
compressed integer bitset as a query to solr. The bitset integers
represent my document ids and the results I want to get back is the
facet data for those documents.

I have successfully created a QueryComponent class that, assuming it
has the integer bitset, can turn that into the necessary DocSetFilter
to pass to the searcher, get back the facets, etc. That part all works
right now because I'm using either canned or randomly generated
bitsets on the server side.

What I'm unsure how to do is actually send this compressed bitset from
a client to solr as part of the query. From what I can tell, the Solr
API classes that are involved in handling binary data as part of a
request assume that the data is a document to be added. For instance,
extending ContentStreamHandlerBase requires implementing some kind of
document loader and an UpdateRequestProcessorChain and a bunch of
other stuff that I don't really think I should need. Is there a
simpler way? Anyone tried or succeeded in doing anything similar to
this?

Thanks,
--jay

Re: Sending binary data as part of a query

Posted by Jay Luker <lb...@reallywow.com>.
On Mon, Jan 31, 2011 at 9:22 PM, Chris Hostetter
<ho...@fucit.org> wrote:

> that class should probably have been named ContentStreamUpdateHandlerBase
> or something like that -- it tries to encapsulate the logic that most
> RequestHandlers using COntentStreams (for updating) need to worry about.
>
> Your QueryComponent (as used by SearchHandler) should be able to access
> the ContentStreams the same way that class does ... call
> req.getContentStreams().
>
> Sending a binary stream from a remote client depends on how the client is
> implemented -- you can do it via HTTP using the POST body (with or w/o
> multi-part mime) in any langauge you want. If you are using SolrJ you may
> again run into an assumption that using ContentStreams means you are doing
> an "Update" but that's just a vernacular thing ... something like a
> ContentStreamUpdateRequest should work just as well for a query (as long
> as you set the neccessary params and/or request handler path)

Thanks for the help. I was just about to reply to my own question for
the benefit of future googlers when I noticed your response. :)

I actually got this working, much the way you suggest. The client is
python. I created a gist with the script I used for testing [1].

On the solr side my QueryComponent grabs the stream, uses
jzlib.ZInputStream to do the deflating, then translates the incoming
integers in the bitset (my solr schema.xml integer ids) to the lucene
ids and creates a docSetFilter with them.

Very relieved to get this working as it's the basis of a talk I'm
giving next week [2]. :-)

--jay

[1] https://gist.github.com/806397
[2] http://code4lib.org/conference/2011/luker

Re: Sending binary data as part of a query

Posted by Chris Hostetter <ho...@fucit.org>.
: I have successfully created a QueryComponent class that, assuming it
: has the integer bitset, can turn that into the necessary DocSetFilter
: to pass to the searcher, get back the facets, etc. That part all works
	...
: What I'm unsure how to do is actually send this compressed bitset from
: a client to solr as part of the query. From what I can tell, the Solr
: API classes that are involved in handling binary data as part of a
: request assume that the data is a document to be added. For instance,
: extending ContentStreamHandlerBase requires implementing some kind of
: document loader and an UpdateRequestProcessorChain and a bunch of

that class should probably have been named ContentStreamUpdateHandlerBase 
or something like that -- it tries to encapsulate the logic that most 
RequestHandlers using COntentStreams (for updating) need to worry about.

Your QueryComponent (as used by SearchHandler) should be able to access 
the ContentStreams the same way that class does ... call 
req.getContentStreams().

Sending a binary stream from a remote client depends on how the client is 
implemented -- you can do it via HTTP using the POST body (with or w/o 
multi-part mime) in any langauge you want. If you are using SolrJ you may 
again run into an assumption that using ContentStreams means you are doing 
an "Update" but that's just a vernacular thing ... something like a 
ContentStreamUpdateRequest should work just as well for a query (as long 
as you set the neccessary params and/or request handler path)


-Hoss