You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mikhail Khludnev <mk...@griddynamics.com> on 2012/04/02 08:55:05 UTC

Re: Responding to Requests with Chunks/Streaming

Hello,

Small update - reading streamed response is done via callback. No
SolrDocumentList in memory.
https://github.com/m-khl/solr-patches/tree/streaming
here is the test
https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138

no progress in distributed search via streaming yet.

Pls let me know if you don't want to have updates from my playground.

Regards

On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> @All
> Why nobody desires such a pretty cool feature?
>
> Nicholas,
> I have a tiny progress: I'm able to stream in javabin codec format while
> searching, It implies sorting by _docid_
>
> here is the diff
>
> https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95
>
> The current issue is that reading response by SolrJ is done as whole.
> Reading by callback is supported by EmbeddedServer only. Anyway it should
> not a big deal. ResponseStreamingTest.java somehow works.
> I'm stuck on introducing response streaming in distributes search, it's
> actually more challenging  - RespStreamDistributedTest fails
>
> Regards
>
>
> On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball <ni...@nodelay.com>wrote:
>
>>
>> Mikhail & Ludovic,
>>
>> Thanks for both your replies, very helpful indeed!
>>
>> Ludovic, I was actually looking into just that and did some tests with
>> SolrJ, it does work well but needs some changes on the Solr server if we
>> want to send out individual documents a various times. This could be done
>> with a write() and flush() to the FastOutputStream (daos) in JavBinCodec.
>> I
>> therefore think that a combination of this and Mikhail's solution would
>> work best!
>>
>> Mikhail, you mention that your solution doesn't currently work and not
>> sure why this is the case, but could it be that you haven't flushed the
>> data (os.flush()) you've written in the collect method of DocSetStreamer?
>> I
>> think placing the output stream into the SolrQueryRequest is the way to
>> go,
>> so that we can access it and write to it how we intend. However, I think
>> using the JavaBinCodec would be ideal so that we can work with SolrJ
>> directly, and not mess around with the encoding of the docs/data etc...
>>
>> At the moment the entry point to JavaBinCodec is through the
>> BinaryResponseWriter which calls the highest level marshal() method which
>> decodes and sends out the entire SolrQueryResponse (line 49 @
>> BinaryResponseWriter). What would be ideal is to be able to break up the
>> response and call the JavaBinCodec for pieces of it with a flush after
>> each
>> call. Did a few tests with a simple Thread.sleep and a flush to see if
>> this
>> would actually work and looks like it's working out perfectly. Just trying
>> to figure out the best way to actually do it now :) any ideas?
>>
>> An another note, for a solution to work with the chunked transfer encoding
>> (and therefore web browsers), a lot more development is going to be
>> needed.
>> Not sure if it's worth trying yet but might look into it later down the
>> line.
>>
>> Nick
>>
>> On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
>> <mk...@griddynamics.com> wrote:
>> > Ludovic,
>> >
>> > I looked through. First of all, it seems to me you don't amend regular
>> > "servlet" solr server, but the only embedded one.
>> > Anyway, the difference is that you stream DocList via callback, but it
>> > means that you've instantiated it in memory and keep it there until it
>> will
>> > be completely consumed. Think about a billion numfound. Core idea of my
>> > approach is keep almost zero memory for response.
>> >
>> > Regards
>> >
>> > On Fri, Mar 16, 2012 at 12:12 AM, lboutros <bo...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> I was looking for something similar.
>> >>
>> >> I tried this patch :
>> >>
>> >> https://issues.apache.org/jira/browse/SOLR-2112
>> >>
>> >> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>> >>
>> >> Is it really different from what you are trying to achieve ?
>> >>
>> >> Ludovic.
>> >>
>> >> -----
>> >> Jouve
>> >> France.
>> >> --
>> >> View this message in context:
>> >>
>>
>> http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> gedel@yandex.ru
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>


-- 
Sincerely yours
Mikhail Khludnev
gedel@yandex.ru

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Responding to Requests with Chunks/Streaming

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Developers,

I just want to ask don't you think that response streaming can be useful
for things like OLAP, e.g. is you have sharded index presorted and
pre-joined by BJQ way you can calculate counts in many cube cells in
parallel?
Essential distributed test for response streaming just passed.
https://github.com/m-khl/solr-patches/blob/ec4db7c0422a5515392a7019c5bd23ad3f546e4b/solr/core/src/test/org/apache/solr/response/RespStreamDistributedTest.java

branch is https://github.com/m-khl/solr-patches/tree/streaming

Regards

On Mon, Apr 2, 2012 at 10:55 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

>
> Hello,
>
> Small update - reading streamed response is done via callback. No
> SolrDocumentList in memory.
> https://github.com/m-khl/solr-patches/tree/streaming
> here is the test
> https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138
>
> no progress in distributed search via streaming yet.
>
> Pls let me know if you don't want to have updates from my playground.
>
> Regards
>
>
> On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> @All
>> Why nobody desires such a pretty cool feature?
>>
>> Nicholas,
>> I have a tiny progress: I'm able to stream in javabin codec format while
>> searching, It implies sorting by _docid_
>>
>> here is the diff
>>
>> https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95
>>
>> The current issue is that reading response by SolrJ is done as whole.
>> Reading by callback is supported by EmbeddedServer only. Anyway it should
>> not a big deal. ResponseStreamingTest.java somehow works.
>> I'm stuck on introducing response streaming in distributes search, it's
>> actually more challenging  - RespStreamDistributedTest fails
>>
>> Regards
>>
>>
>> On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball <nicholas.ball@nodelay.com
>> > wrote:
>>
>>>
>>> Mikhail & Ludovic,
>>>
>>> Thanks for both your replies, very helpful indeed!
>>>
>>> Ludovic, I was actually looking into just that and did some tests with
>>> SolrJ, it does work well but needs some changes on the Solr server if we
>>> want to send out individual documents a various times. This could be done
>>> with a write() and flush() to the FastOutputStream (daos) in
>>> JavBinCodec. I
>>> therefore think that a combination of this and Mikhail's solution would
>>> work best!
>>>
>>> Mikhail, you mention that your solution doesn't currently work and not
>>> sure why this is the case, but could it be that you haven't flushed the
>>> data (os.flush()) you've written in the collect method of
>>> DocSetStreamer? I
>>> think placing the output stream into the SolrQueryRequest is the way to
>>> go,
>>> so that we can access it and write to it how we intend. However, I think
>>> using the JavaBinCodec would be ideal so that we can work with SolrJ
>>> directly, and not mess around with the encoding of the docs/data etc...
>>>
>>> At the moment the entry point to JavaBinCodec is through the
>>> BinaryResponseWriter which calls the highest level marshal() method which
>>> decodes and sends out the entire SolrQueryResponse (line 49 @
>>> BinaryResponseWriter). What would be ideal is to be able to break up the
>>> response and call the JavaBinCodec for pieces of it with a flush after
>>> each
>>> call. Did a few tests with a simple Thread.sleep and a flush to see if
>>> this
>>> would actually work and looks like it's working out perfectly. Just
>>> trying
>>> to figure out the best way to actually do it now :) any ideas?
>>>
>>> An another note, for a solution to work with the chunked transfer
>>> encoding
>>> (and therefore web browsers), a lot more development is going to be
>>> needed.
>>> Not sure if it's worth trying yet but might look into it later down the
>>> line.
>>>
>>> Nick
>>>
>>> On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
>>> <mk...@griddynamics.com> wrote:
>>> > Ludovic,
>>> >
>>> > I looked through. First of all, it seems to me you don't amend regular
>>> > "servlet" solr server, but the only embedded one.
>>> > Anyway, the difference is that you stream DocList via callback, but it
>>> > means that you've instantiated it in memory and keep it there until it
>>> will
>>> > be completely consumed. Think about a billion numfound. Core idea of my
>>> > approach is keep almost zero memory for response.
>>> >
>>> > Regards
>>> >
>>> > On Fri, Mar 16, 2012 at 12:12 AM, lboutros <bo...@gmail.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I was looking for something similar.
>>> >>
>>> >> I tried this patch :
>>> >>
>>> >> https://issues.apache.org/jira/browse/SOLR-2112
>>> >>
>>> >> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>>> >>
>>> >> Is it really different from what you are trying to achieve ?
>>> >>
>>> >> Ludovic.
>>> >>
>>> >> -----
>>> >> Jouve
>>> >> France.
>>> >> --
>>> >> View this message in context:
>>> >>
>>>
>>> http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> gedel@yandex.ru
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> gedel@yandex.ru
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>


-- 
Sincerely yours
Mikhail Khludnev
gedel@yandex.ru

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Responding to Requests with Chunks/Streaming

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hello Developers,

I just want to ask don't you think that response streaming can be useful
for things like OLAP, e.g. is you have sharded index presorted and
pre-joined by BJQ way you can calculate counts in many cube cells in
parallel?
Essential distributed test for response streaming just passed.
https://github.com/m-khl/solr-patches/blob/ec4db7c0422a5515392a7019c5bd23ad3f546e4b/solr/core/src/test/org/apache/solr/response/RespStreamDistributedTest.java

branch is https://github.com/m-khl/solr-patches/tree/streaming

Regards

On Mon, Apr 2, 2012 at 10:55 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

>
> Hello,
>
> Small update - reading streamed response is done via callback. No
> SolrDocumentList in memory.
> https://github.com/m-khl/solr-patches/tree/streaming
> here is the test
> https://github.com/m-khl/solr-patches/blob/d028d4fabe0c20cb23f16098637e2961e9e2366e/solr/core/src/test/org/apache/solr/response/ResponseStreamingTest.java#L138
>
> no progress in distributed search via streaming yet.
>
> Pls let me know if you don't want to have updates from my playground.
>
> Regards
>
>
> On Thu, Mar 29, 2012 at 1:02 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
>> @All
>> Why nobody desires such a pretty cool feature?
>>
>> Nicholas,
>> I have a tiny progress: I'm able to stream in javabin codec format while
>> searching, It implies sorting by _docid_
>>
>> here is the diff
>>
>> https://github.com/m-khl/solr-patches/commit/2f9ff068c379b3008bb983d0df69dff714ddde95
>>
>> The current issue is that reading response by SolrJ is done as whole.
>> Reading by callback is supported by EmbeddedServer only. Anyway it should
>> not a big deal. ResponseStreamingTest.java somehow works.
>> I'm stuck on introducing response streaming in distributes search, it's
>> actually more challenging  - RespStreamDistributedTest fails
>>
>> Regards
>>
>>
>> On Fri, Mar 16, 2012 at 3:51 PM, Nicholas Ball <nicholas.ball@nodelay.com
>> > wrote:
>>
>>>
>>> Mikhail & Ludovic,
>>>
>>> Thanks for both your replies, very helpful indeed!
>>>
>>> Ludovic, I was actually looking into just that and did some tests with
>>> SolrJ, it does work well but needs some changes on the Solr server if we
>>> want to send out individual documents a various times. This could be done
>>> with a write() and flush() to the FastOutputStream (daos) in
>>> JavBinCodec. I
>>> therefore think that a combination of this and Mikhail's solution would
>>> work best!
>>>
>>> Mikhail, you mention that your solution doesn't currently work and not
>>> sure why this is the case, but could it be that you haven't flushed the
>>> data (os.flush()) you've written in the collect method of
>>> DocSetStreamer? I
>>> think placing the output stream into the SolrQueryRequest is the way to
>>> go,
>>> so that we can access it and write to it how we intend. However, I think
>>> using the JavaBinCodec would be ideal so that we can work with SolrJ
>>> directly, and not mess around with the encoding of the docs/data etc...
>>>
>>> At the moment the entry point to JavaBinCodec is through the
>>> BinaryResponseWriter which calls the highest level marshal() method which
>>> decodes and sends out the entire SolrQueryResponse (line 49 @
>>> BinaryResponseWriter). What would be ideal is to be able to break up the
>>> response and call the JavaBinCodec for pieces of it with a flush after
>>> each
>>> call. Did a few tests with a simple Thread.sleep and a flush to see if
>>> this
>>> would actually work and looks like it's working out perfectly. Just
>>> trying
>>> to figure out the best way to actually do it now :) any ideas?
>>>
>>> An another note, for a solution to work with the chunked transfer
>>> encoding
>>> (and therefore web browsers), a lot more development is going to be
>>> needed.
>>> Not sure if it's worth trying yet but might look into it later down the
>>> line.
>>>
>>> Nick
>>>
>>> On Fri, 16 Mar 2012 07:29:20 +0300, Mikhail Khludnev
>>> <mk...@griddynamics.com> wrote:
>>> > Ludovic,
>>> >
>>> > I looked through. First of all, it seems to me you don't amend regular
>>> > "servlet" solr server, but the only embedded one.
>>> > Anyway, the difference is that you stream DocList via callback, but it
>>> > means that you've instantiated it in memory and keep it there until it
>>> will
>>> > be completely consumed. Think about a billion numfound. Core idea of my
>>> > approach is keep almost zero memory for response.
>>> >
>>> > Regards
>>> >
>>> > On Fri, Mar 16, 2012 at 12:12 AM, lboutros <bo...@gmail.com> wrote:
>>> >
>>> >> Hi,
>>> >>
>>> >> I was looking for something similar.
>>> >>
>>> >> I tried this patch :
>>> >>
>>> >> https://issues.apache.org/jira/browse/SOLR-2112
>>> >>
>>> >> it's working quite well (I've back-ported the code in Solr 3.5.0...).
>>> >>
>>> >> Is it really different from what you are trying to achieve ?
>>> >>
>>> >> Ludovic.
>>> >>
>>> >> -----
>>> >> Jouve
>>> >> France.
>>> >> --
>>> >> View this message in context:
>>> >>
>>>
>>> http://lucene.472066.n3.nabble.com/Responding-to-Requests-with-Chunks-Streaming-tp3827316p3829909.html
>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>
>>>
>>
>>
>>
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> gedel@yandex.ru
>>
>> <http://www.griddynamics.com>
>>  <mk...@griddynamics.com>
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> gedel@yandex.ru
>
> <http://www.griddynamics.com>
>  <mk...@griddynamics.com>
>
>


-- 
Sincerely yours
Mikhail Khludnev
gedel@yandex.ru

<http://www.griddynamics.com>
 <mk...@griddynamics.com>