You are viewing a plain text version of this content. The canonical link for it is here.
Posted to blur-user@incubator.apache.org by Ravikumar Govindarajan <ra...@gmail.com> on 2016/06/28 10:37:50 UTC

Re: Help needed on SearchExecutor...

Aaron,

Just an update..

https://issues.apache.org/jira/browse/LUCENE-5299

You can now use any collector & get guaranteed parallel execution. They
have also provided a "parallelism" hint that will limit the number of
search threads at request level...

i.e., we can fix blur executor thread-count at 128 & limit "parallelism" at
a max of 4 threads per request..

On Fri, Feb 6, 2015 at 5:25 PM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Thanks for the clarifications.
>
> Another point I thought about is the disk efficiency of a serving a
> random-IO. Many parallel threads could end-up hitting just one or two disks
> in the cluster…
>
> Think I can skip it safely for my work-loads.
>
> --
> Ravi
>
> On Fri, Feb 6, 2015 at 3:09 PM, Aaron McCurry <am...@gmail.com> wrote:
>
>> The ServiceExecutor (thread pool) put inside the IndexSearcher was an
>> attempt at making the segments search in parallel when available.  However
>> there is a limitation in Lucene that does not allow segment parallel
>> searches when you are using Collectors.
>>
>>
>> https://github.com/apache/lucene-solr/blob/lucene_solr_4_3_0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L595
>>
>> We override this method to allow for Tracing:
>>
>>
>> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableBase.java#L46
>>
>> and here:
>>
>>
>> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableSecureBase.java#L51
>>
>> I agree that if you are already running a lot of shards per server that if
>> we were to enhance Lucene to allow for parallel searching of segments it
>> could become counter productive.  I have seen underutilized systems that
>> could take advantage of the parallel segment search, so as with any
>> feature
>> like this, it depends.  :-)
>>
>> Aaron
>>
>> On Fri, Feb 6, 2015 at 2:39 AM, Ravikumar Govindarajan <
>> ravikumar.govindarajan@gmail.com> wrote:
>>
>> > Blur by default uses a SearchExecutor for IndexSearcher. I believe
>> lucene
>> > helps searching segments of a single shard in parallel.
>> >
>> > Our previous index was built on a lower version of lucene where such a
>> > feature was absent and we ran sequential search per shard only…
>> >
>> > What is the general recommendation for blur? Is it advisable to use the
>> > SearchExecutor? What will happen when there are many parallel queries
>> for
>> > different shards. Will SearchExecutor become a bottle-neck?
>> >
>> > Any help is much appreciated...
>> >
>> > --
>> > Ravi
>> >
>>
>
>

Re: Help needed on SearchExecutor...

Posted by Aaron McCurry <am...@gmail.com>.
Yeah I think that would work.

On Thu, Jul 7, 2016 at 9:58 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> I just now looked at IndexSearcherCloseableSecureBase.java
>
> Guess if we want to cap each search request with max-of "n" threads, we can
> plug the above logic into this class directly instead of
> BlurIndexSimpleWriter.java
>
> On Wed, Jun 29, 2016 at 6:04 PM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > This is really nice Aaron. You've done the bulk of work already!!!
> >
> > I think parallelism can be provided too for searching a single shard....
> >
> > Just as a quick proposal, we can do a static initialization in
> > BlurIndexSimpleWriter
> >
> > static LinkedBlockingQueue executorQueue = new LBQ(128/4);
> >
> > static {
> >  for(int i=0;i<128/4;i++) {
> >     queue.add(Executors.newFixedThreadPool(4));
> >    }
> > }
> > ----
> >
> > Incoming search request per-shard...
> >
> > public IndexSearcher getIndexSearcher() {
> >  .....
> >  Executor current = executorQueue.poll();
> >
> > if (current==null) {
> >   //All thread-pools are busy or user has explicitly switched off via
> > config.
> >   //Search proceeds in single threaded fashion utilizing calling-thread
> > itself
> > }
> >
> > return new IndexSearcherCloseable(indexReader, current);
> > }
> > ---
> >
> > Btw, we can do this by over-riding a single method
> > IndexSearcher.slices(...) in lucene 5.x & above!!!
> >
> >
> > On Tue, Jun 28, 2016 at 8:01 PM, Aaron McCurry <am...@gmail.com>
> wrote:
> >
> >> Some time ago I created something similar, it's kinda a backport into
> >> Lucene 4.3:
> >>
> >>
> >>
> https://github.com/apache/incubator-blur/blob/65640200a8e7dd539c1dd4d920255c717102b9b2/blur-query/src/main/java/org/apache/blur/lucene/search/CloneableCollector.java#L25
> >>
> >> It's handles the execution of searching the segments in parallel but
> >> doesn't provide any limitations on parallelism.
> >>
> >> Aaron
> >>
> >>
> >>
> >> On Tue, Jun 28, 2016 at 6:37 AM, Ravikumar Govindarajan <
> >> ravikumar.govindarajan@gmail.com> wrote:
> >>
> >> > Aaron,
> >> >
> >> > Just an update..
> >> >
> >> > https://issues.apache.org/jira/browse/LUCENE-5299
> >> >
> >> > You can now use any collector & get guaranteed parallel execution.
> They
> >> > have also provided a "parallelism" hint that will limit the number of
> >> > search threads at request level...
> >> >
> >> > i.e., we can fix blur executor thread-count at 128 & limit
> >> "parallelism" at
> >> > a max of 4 threads per request..
> >> >
> >> > On Fri, Feb 6, 2015 at 5:25 PM, Ravikumar Govindarajan <
> >> > ravikumar.govindarajan@gmail.com> wrote:
> >> >
> >> > > Thanks for the clarifications.
> >> > >
> >> > > Another point I thought about is the disk efficiency of a serving a
> >> > > random-IO. Many parallel threads could end-up hitting just one or
> two
> >> > disks
> >> > > in the cluster…
> >> > >
> >> > > Think I can skip it safely for my work-loads.
> >> > >
> >> > > --
> >> > > Ravi
> >> > >
> >> > > On Fri, Feb 6, 2015 at 3:09 PM, Aaron McCurry <am...@gmail.com>
> >> > wrote:
> >> > >
> >> > >> The ServiceExecutor (thread pool) put inside the IndexSearcher was
> an
> >> > >> attempt at making the segments search in parallel when available.
> >> > However
> >> > >> there is a limitation in Lucene that does not allow segment
> parallel
> >> > >> searches when you are using Collectors.
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_3_0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L595
> >> > >>
> >> > >> We override this method to allow for Tracing:
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableBase.java#L46
> >> > >>
> >> > >> and here:
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableSecureBase.java#L51
> >> > >>
> >> > >> I agree that if you are already running a lot of shards per server
> >> that
> >> > if
> >> > >> we were to enhance Lucene to allow for parallel searching of
> >> segments it
> >> > >> could become counter productive.  I have seen underutilized systems
> >> that
> >> > >> could take advantage of the parallel segment search, so as with any
> >> > >> feature
> >> > >> like this, it depends.  :-)
> >> > >>
> >> > >> Aaron
> >> > >>
> >> > >> On Fri, Feb 6, 2015 at 2:39 AM, Ravikumar Govindarajan <
> >> > >> ravikumar.govindarajan@gmail.com> wrote:
> >> > >>
> >> > >> > Blur by default uses a SearchExecutor for IndexSearcher. I
> believe
> >> > >> lucene
> >> > >> > helps searching segments of a single shard in parallel.
> >> > >> >
> >> > >> > Our previous index was built on a lower version of lucene where
> >> such a
> >> > >> > feature was absent and we ran sequential search per shard only…
> >> > >> >
> >> > >> > What is the general recommendation for blur? Is it advisable to
> use
> >> > the
> >> > >> > SearchExecutor? What will happen when there are many parallel
> >> queries
> >> > >> for
> >> > >> > different shards. Will SearchExecutor become a bottle-neck?
> >> > >> >
> >> > >> > Any help is much appreciated...
> >> > >> >
> >> > >> > --
> >> > >> > Ravi
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Help needed on SearchExecutor...

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
I just now looked at IndexSearcherCloseableSecureBase.java

Guess if we want to cap each search request with max-of "n" threads, we can
plug the above logic into this class directly instead of
BlurIndexSimpleWriter.java

On Wed, Jun 29, 2016 at 6:04 PM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> This is really nice Aaron. You've done the bulk of work already!!!
>
> I think parallelism can be provided too for searching a single shard....
>
> Just as a quick proposal, we can do a static initialization in
> BlurIndexSimpleWriter
>
> static LinkedBlockingQueue executorQueue = new LBQ(128/4);
>
> static {
>  for(int i=0;i<128/4;i++) {
>     queue.add(Executors.newFixedThreadPool(4));
>    }
> }
> ----
>
> Incoming search request per-shard...
>
> public IndexSearcher getIndexSearcher() {
>  .....
>  Executor current = executorQueue.poll();
>
> if (current==null) {
>   //All thread-pools are busy or user has explicitly switched off via
> config.
>   //Search proceeds in single threaded fashion utilizing calling-thread
> itself
> }
>
> return new IndexSearcherCloseable(indexReader, current);
> }
> ---
>
> Btw, we can do this by over-riding a single method
> IndexSearcher.slices(...) in lucene 5.x & above!!!
>
>
> On Tue, Jun 28, 2016 at 8:01 PM, Aaron McCurry <am...@gmail.com> wrote:
>
>> Some time ago I created something similar, it's kinda a backport into
>> Lucene 4.3:
>>
>>
>> https://github.com/apache/incubator-blur/blob/65640200a8e7dd539c1dd4d920255c717102b9b2/blur-query/src/main/java/org/apache/blur/lucene/search/CloneableCollector.java#L25
>>
>> It's handles the execution of searching the segments in parallel but
>> doesn't provide any limitations on parallelism.
>>
>> Aaron
>>
>>
>>
>> On Tue, Jun 28, 2016 at 6:37 AM, Ravikumar Govindarajan <
>> ravikumar.govindarajan@gmail.com> wrote:
>>
>> > Aaron,
>> >
>> > Just an update..
>> >
>> > https://issues.apache.org/jira/browse/LUCENE-5299
>> >
>> > You can now use any collector & get guaranteed parallel execution. They
>> > have also provided a "parallelism" hint that will limit the number of
>> > search threads at request level...
>> >
>> > i.e., we can fix blur executor thread-count at 128 & limit
>> "parallelism" at
>> > a max of 4 threads per request..
>> >
>> > On Fri, Feb 6, 2015 at 5:25 PM, Ravikumar Govindarajan <
>> > ravikumar.govindarajan@gmail.com> wrote:
>> >
>> > > Thanks for the clarifications.
>> > >
>> > > Another point I thought about is the disk efficiency of a serving a
>> > > random-IO. Many parallel threads could end-up hitting just one or two
>> > disks
>> > > in the cluster…
>> > >
>> > > Think I can skip it safely for my work-loads.
>> > >
>> > > --
>> > > Ravi
>> > >
>> > > On Fri, Feb 6, 2015 at 3:09 PM, Aaron McCurry <am...@gmail.com>
>> > wrote:
>> > >
>> > >> The ServiceExecutor (thread pool) put inside the IndexSearcher was an
>> > >> attempt at making the segments search in parallel when available.
>> > However
>> > >> there is a limitation in Lucene that does not allow segment parallel
>> > >> searches when you are using Collectors.
>> > >>
>> > >>
>> > >>
>> >
>> https://github.com/apache/lucene-solr/blob/lucene_solr_4_3_0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L595
>> > >>
>> > >> We override this method to allow for Tracing:
>> > >>
>> > >>
>> > >>
>> >
>> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableBase.java#L46
>> > >>
>> > >> and here:
>> > >>
>> > >>
>> > >>
>> >
>> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableSecureBase.java#L51
>> > >>
>> > >> I agree that if you are already running a lot of shards per server
>> that
>> > if
>> > >> we were to enhance Lucene to allow for parallel searching of
>> segments it
>> > >> could become counter productive.  I have seen underutilized systems
>> that
>> > >> could take advantage of the parallel segment search, so as with any
>> > >> feature
>> > >> like this, it depends.  :-)
>> > >>
>> > >> Aaron
>> > >>
>> > >> On Fri, Feb 6, 2015 at 2:39 AM, Ravikumar Govindarajan <
>> > >> ravikumar.govindarajan@gmail.com> wrote:
>> > >>
>> > >> > Blur by default uses a SearchExecutor for IndexSearcher. I believe
>> > >> lucene
>> > >> > helps searching segments of a single shard in parallel.
>> > >> >
>> > >> > Our previous index was built on a lower version of lucene where
>> such a
>> > >> > feature was absent and we ran sequential search per shard only…
>> > >> >
>> > >> > What is the general recommendation for blur? Is it advisable to use
>> > the
>> > >> > SearchExecutor? What will happen when there are many parallel
>> queries
>> > >> for
>> > >> > different shards. Will SearchExecutor become a bottle-neck?
>> > >> >
>> > >> > Any help is much appreciated...
>> > >> >
>> > >> > --
>> > >> > Ravi
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Help needed on SearchExecutor...

Posted by Ravikumar Govindarajan <ra...@gmail.com>.
This is really nice Aaron. You've done the bulk of work already!!!

I think parallelism can be provided too for searching a single shard....

Just as a quick proposal, we can do a static initialization in
BlurIndexSimpleWriter

static LinkedBlockingQueue executorQueue = new LBQ(128/4);

static {
 for(int i=0;i<128/4;i++) {
    queue.add(Executors.newFixedThreadPool(4));
   }
}
----

Incoming search request per-shard...

public IndexSearcher getIndexSearcher() {
 .....
 Executor current = executorQueue.poll();

if (current==null) {
  //All thread-pools are busy or user has explicitly switched off via
config.
  //Search proceeds in single threaded fashion utilizing calling-thread
itself
}

return new IndexSearcherCloseable(indexReader, current);
}
---

Btw, we can do this by over-riding a single method
IndexSearcher.slices(...) in lucene 5.x & above!!!


On Tue, Jun 28, 2016 at 8:01 PM, Aaron McCurry <am...@gmail.com> wrote:

> Some time ago I created something similar, it's kinda a backport into
> Lucene 4.3:
>
>
> https://github.com/apache/incubator-blur/blob/65640200a8e7dd539c1dd4d920255c717102b9b2/blur-query/src/main/java/org/apache/blur/lucene/search/CloneableCollector.java#L25
>
> It's handles the execution of searching the segments in parallel but
> doesn't provide any limitations on parallelism.
>
> Aaron
>
>
>
> On Tue, Jun 28, 2016 at 6:37 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Aaron,
> >
> > Just an update..
> >
> > https://issues.apache.org/jira/browse/LUCENE-5299
> >
> > You can now use any collector & get guaranteed parallel execution. They
> > have also provided a "parallelism" hint that will limit the number of
> > search threads at request level...
> >
> > i.e., we can fix blur executor thread-count at 128 & limit "parallelism"
> at
> > a max of 4 threads per request..
> >
> > On Fri, Feb 6, 2015 at 5:25 PM, Ravikumar Govindarajan <
> > ravikumar.govindarajan@gmail.com> wrote:
> >
> > > Thanks for the clarifications.
> > >
> > > Another point I thought about is the disk efficiency of a serving a
> > > random-IO. Many parallel threads could end-up hitting just one or two
> > disks
> > > in the cluster…
> > >
> > > Think I can skip it safely for my work-loads.
> > >
> > > --
> > > Ravi
> > >
> > > On Fri, Feb 6, 2015 at 3:09 PM, Aaron McCurry <am...@gmail.com>
> > wrote:
> > >
> > >> The ServiceExecutor (thread pool) put inside the IndexSearcher was an
> > >> attempt at making the segments search in parallel when available.
> > However
> > >> there is a limitation in Lucene that does not allow segment parallel
> > >> searches when you are using Collectors.
> > >>
> > >>
> > >>
> >
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_3_0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L595
> > >>
> > >> We override this method to allow for Tracing:
> > >>
> > >>
> > >>
> >
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableBase.java#L46
> > >>
> > >> and here:
> > >>
> > >>
> > >>
> >
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableSecureBase.java#L51
> > >>
> > >> I agree that if you are already running a lot of shards per server
> that
> > if
> > >> we were to enhance Lucene to allow for parallel searching of segments
> it
> > >> could become counter productive.  I have seen underutilized systems
> that
> > >> could take advantage of the parallel segment search, so as with any
> > >> feature
> > >> like this, it depends.  :-)
> > >>
> > >> Aaron
> > >>
> > >> On Fri, Feb 6, 2015 at 2:39 AM, Ravikumar Govindarajan <
> > >> ravikumar.govindarajan@gmail.com> wrote:
> > >>
> > >> > Blur by default uses a SearchExecutor for IndexSearcher. I believe
> > >> lucene
> > >> > helps searching segments of a single shard in parallel.
> > >> >
> > >> > Our previous index was built on a lower version of lucene where
> such a
> > >> > feature was absent and we ran sequential search per shard only…
> > >> >
> > >> > What is the general recommendation for blur? Is it advisable to use
> > the
> > >> > SearchExecutor? What will happen when there are many parallel
> queries
> > >> for
> > >> > different shards. Will SearchExecutor become a bottle-neck?
> > >> >
> > >> > Any help is much appreciated...
> > >> >
> > >> > --
> > >> > Ravi
> > >> >
> > >>
> > >
> > >
> >
>

Re: Help needed on SearchExecutor...

Posted by Aaron McCurry <am...@gmail.com>.
Some time ago I created something similar, it's kinda a backport into
Lucene 4.3:

https://github.com/apache/incubator-blur/blob/65640200a8e7dd539c1dd4d920255c717102b9b2/blur-query/src/main/java/org/apache/blur/lucene/search/CloneableCollector.java#L25

It's handles the execution of searching the segments in parallel but
doesn't provide any limitations on parallelism.

Aaron



On Tue, Jun 28, 2016 at 6:37 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> Aaron,
>
> Just an update..
>
> https://issues.apache.org/jira/browse/LUCENE-5299
>
> You can now use any collector & get guaranteed parallel execution. They
> have also provided a "parallelism" hint that will limit the number of
> search threads at request level...
>
> i.e., we can fix blur executor thread-count at 128 & limit "parallelism" at
> a max of 4 threads per request..
>
> On Fri, Feb 6, 2015 at 5:25 PM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Thanks for the clarifications.
> >
> > Another point I thought about is the disk efficiency of a serving a
> > random-IO. Many parallel threads could end-up hitting just one or two
> disks
> > in the cluster…
> >
> > Think I can skip it safely for my work-loads.
> >
> > --
> > Ravi
> >
> > On Fri, Feb 6, 2015 at 3:09 PM, Aaron McCurry <am...@gmail.com>
> wrote:
> >
> >> The ServiceExecutor (thread pool) put inside the IndexSearcher was an
> >> attempt at making the segments search in parallel when available.
> However
> >> there is a limitation in Lucene that does not allow segment parallel
> >> searches when you are using Collectors.
> >>
> >>
> >>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_3_0/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L595
> >>
> >> We override this method to allow for Tracing:
> >>
> >>
> >>
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableBase.java#L46
> >>
> >> and here:
> >>
> >>
> >>
> https://github.com/apache/incubator-blur/blob/master/blur-core/src/main/java/org/apache/blur/server/IndexSearcherCloseableSecureBase.java#L51
> >>
> >> I agree that if you are already running a lot of shards per server that
> if
> >> we were to enhance Lucene to allow for parallel searching of segments it
> >> could become counter productive.  I have seen underutilized systems that
> >> could take advantage of the parallel segment search, so as with any
> >> feature
> >> like this, it depends.  :-)
> >>
> >> Aaron
> >>
> >> On Fri, Feb 6, 2015 at 2:39 AM, Ravikumar Govindarajan <
> >> ravikumar.govindarajan@gmail.com> wrote:
> >>
> >> > Blur by default uses a SearchExecutor for IndexSearcher. I believe
> >> lucene
> >> > helps searching segments of a single shard in parallel.
> >> >
> >> > Our previous index was built on a lower version of lucene where such a
> >> > feature was absent and we ran sequential search per shard only…
> >> >
> >> > What is the general recommendation for blur? Is it advisable to use
> the
> >> > SearchExecutor? What will happen when there are many parallel queries
> >> for
> >> > different shards. Will SearchExecutor become a bottle-neck?
> >> >
> >> > Any help is much appreciated...
> >> >
> >> > --
> >> > Ravi
> >> >
> >>
> >
> >
>