You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Mikhail Khludnev <mk...@griddynamics.com> on 2012/04/12 22:17:07 UTC

Re: Responding to Requests with Chunks/Streaming

Hello Developers,

I just want to ask don't you think that response streaming can be useful
for things like OLAP, e.g. is you have sharded index presorted and
pre-joined by BJQ way you can calculate counts in many cube cells in
parallel?
Essential distributed test for response streaming just passed.
https://github.com/m-khl/solr-patches/blob/ec4db7c0422a5515392a7019c5bd23ad3f546e4b/solr/core/src/test/org/apache/solr/response/RespStreamDistributedTest.java

branch is https://github.com/m-khl/solr-patches/tree/streaming

Regards

On Mon, Apr 2, 2012 at 10:55 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

>
> Hello,
>
> Small update - reading streamed response is done via callback. No
> SolrDocumentList in memory.
> https://github.com/m-khl/solr-patches/tree/streaming
> here is the test
> https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138
>
> no progress in distributed search via streaming yet.
>
> Pls let me know if you don't want to have updates from my playground.
>
> Regards
>
>
> On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> @All
>> Why nobody desires such a pretty cool feature?
>>
>> Nicholas,
>> I have a tiny progress: I'm able to stream in javabin codec format while
>> searching, It implies sorting by _docid_
>>
>> here is the diff
>>
>> https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95
>>
>> The current issue is that reading response by SolrJ is done as whole.
>> Reading by callback is supported by EmbeddedServer only. Anyway it should
>> not a big deal. ResponseStreamingTest.java somehow works.
>> I'm stuck on introducing response streaming in distributes search, it's
>> actually more challenging  - RespStreamDistributedTest fails
>>
>> Regards
>>
>>
>> On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball <nicholas.ball@nodelay.com
>> > wrote:
>>
>>>
>>> Mikhail & Ludovic,
>>>
>>> Thanks for both your replies, very helpful indeed!
>>>
>>> Ludovic, I was actually looking into just that and did some tests with
>>> SolrJ, it does work well but needs some changes on the Solr server if we
>>> want to send out individual documents a various times. This could be done
>>> with a write() and flush() to the FastOutputStream (daos) in
>>> JavBinCodec. I
>>> therefore think that a combination of this and Mikhail's solution would
>>> work best!
>>>
>>> Mikhail, you mention that your solution doesn't currently work and not
>>> sure why this is the case, but could it be that you haven't flushed the
>>> data (os.flush()) you've written in the collect method of
>>> DocSetStreamer? I
>>> think placing the output stream into the SolrQueryRequest is the way to
>>> go,
>>> so that we can access it and write to it how we intend. However, I think
>>> using the JavaBinCodec would be ideal so that we can work with SolrJ
>>> directly, and not mess around with the encoding of the docs/data etc...
>>>
>>> At the moment the entry point to JavaBinCodec is through the
>>> BinaryResponseWriter which calls the highest level marshal() method which
>>> decodes and sends out the entire SolrQueryResponse (line 49 @
>>> BinaryResponseWriter). What would be ideal is to be able to break up the
>>> response and call the JavaBinCodec for pieces of it with a flush after
>>> each
>>> call. Did a few tests with a simple Thread.sleep and a flush to see if
>>> this
>>> would actually work and looks like it's working out perfectly. Just
>>> trying
>>> to figure out the best way to actually do it now :) any ideas?
>>>
>>> An another note, for a solution to work with the chunked transfer
>>> encoding
>>> (and therefore web browsers), a lot more development is going to be
>>> needed.
>>> Not sure if it's worth trying yet but might look into it later down the
>>> line.
>>>
>>> Nick
>>>
>>> On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
>>> <mk...@griddynamics.com> wrote:
>>> > Ludovic,
>>> >
>>> > I looked through. First of all, it seems to me you don't amend regular
>>> > "servlet" solr server, but the only embedded one.
>>> > Anyway, the difference is that you stream DocList via callback, but it
>>> > means that you've instantiated it in memory and keep it there until it
>>> will
>>> > be completely consumed. Think about a billion numfound. Core idea of my
>>> > approach is keep almost zero memory for response.
>>> >
>>> > Regards
>>> >
>>> > On Fri, Mar 16, 2012 at 12:12 AM, lboutros <bo...@gmail.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I was looking for something similar.
>>> >>
>>> >> I tried this patch :
>>> >>
>>> >> https://issues.apache.org/jira/browse/SOLR-2112
>>> >>
>>> >> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>>> >>
>>> >> Is it really different from what you are trying to achieve ?
>>> >>
>>> >> Ludovic.
>>> >>
>>> >> -----
>>> >> Jouve
>>> >> France.
>>> >> --
>>> >> View this message in context:
>>> >>
>>>
>>> http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> gedel@yandex.ru
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> gedel@yandex.ru
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>


-- 
Sincerely yours
Mikhail Khludnev
gedel@yandex.ru

<http://www.griddynamics.com>
 <mk...@griddynamics.com>