You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Tarun Kumar <ta...@sumologic.com> on 2016/06/28 11:05:58 UTC

lucene index reader performance

I am running lucene 4.6.1. I am trying to get documents corresponding to
docIds. All threads get stuck (don't get stuck exactly but spend a LOT of
time in) at:

java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
        at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
        at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
        at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
        at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
        at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
        at
org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
        at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
        at
org.apache.lucene.index.IndexReader.document(IndexReader.java:440)


There is no disk throttling. What can result into this?

Thanks
Tarun

Re: lucene index reader performance

Posted by Michael McCandless <lu...@mikemccandless.com>.

Somehow you need to get the sorting server-side ... that's really the only
way to do your use case efficiently.

Why can't you sort each request to your N shards, and then do a merge sort
on the client side, to get the top hits?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jul 7, 2016 at 5:48 AM, Tarun Kumar <ta...@sumologic.com> wrote:

> Any suggestions pls?
>
> On Mon, Jul 4, 2016 at 3:37 PM, Tarun Kumar <ta...@sumologic.com> wrote:
>
>> Hey Michael,
>>
>> docIds from multiple indices (from multiple machines) need to be
>> aggregated, sorted and first few thousand new to be queried. These few
>> thousand docs can be distributed among multiple machines. Each machine will
>> search the docs which are there in their own indices. So, pulling sorting
>> on server side won't suffice the use-case. Is there a alternative to get
>> document for given docIds faster?
>>
>> Thanks
>> Tarun
>>
>> On Mon, Jul 4, 2016 at 3:17 PM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> Why not ask Lucene to do the sort on your time field, instead of pulling
>>> millions of docids to the client and having it sort.  You could even do
>>> index-time sorting by time field if you want, which makes early termination
>>> possible (faster sorted searches).
>>>
>>> But if even on having Lucene do the sort you still need to load millions
>>> of documents per search request, you are in trouble: you need to
>>> re-formulate your use case somehow to take advantage of what Lucene is good
>>> for (getting top results for a search).
>>>
>>> Maybe you can use faceting to do whatever aggregation you are currently
>>> doing after retrieving those millions of documents.
>>>
>>> Maybe you could make a custom collector, and use doc values, to do your
>>> own custom aggregation.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Mon, Jul 4, 2016 at 1:39 AM, Tarun Kumar <ta...@sumologic.com> wrote:
>>>
>>>> Thanks for reply Michael! In my application, i need to get millions of
>>>> documents per search.
>>>>
>>>> Use case is following: return documents in increasing order of field
>>>> time. Client (caller) can't hold more than a few thousand docs at a time so
>>>> it gets all docIds and corresponding time field for each doc, sort them on
>>>> time and get n docs at a time. To support this usecase, i am:
>>>>
>>>> - getting all docsIds first.
>>>> - Sort docIds on time fields.
>>>> - Query n docids at a time from client which make
>>>> indexReader.document(docId) call for all n docs at server, combine the docs
>>>> these docs and return.
>>>>
>>>> indexReader.document(docId) is creating bottlenecks. What alternatives
>>>> do you suggest?
>>>>
>>>> On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless <
>>>> lucene@mikemccandless.com> wrote:
>>>>
>>>>> Are you maybe trying to load too many documents for each search
>>>>> request?
>>>>>
>>>>> The IR.document API is designed to be used to load just a few hits,
>>>>> like a page worth or ~ 10 documents, per search.
>>>>>
>>>>> Mike McCandless
>>>>>
>>>>> http://blog.mikemccandless.com
>>>>>
>>>>> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com>
>>>>> wrote:
>>>>>
>>>>>> I am running lucene 4.6.1. I am trying to get documents corresponding
>>>>>> to
>>>>>> docIds. All threads get stuck (don't get stuck exactly but spend a
>>>>>> LOT of
>>>>>> time in) at:
>>>>>>
>>>>>> java.lang.Thread.State: RUNNABLE
>>>>>>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>>>>>         at
>>>>>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>>         at
>>>>>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>>>>>>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>>>>>>         at
>>>>>>
>>>>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>>>>>>         at
>>>>>>
>>>>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>>>>>>         at
>>>>>>
>>>>>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>>>>>>         at
>>>>>> org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>>>>>>         at
>>>>>>
>>>>>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>>>>>>         at
>>>>>>
>>>>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>>>>>>         at
>>>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>>>>>>         at
>>>>>>
>>>>>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>>>>>>         at
>>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>>>>>>
>>>>>>
>>>>>> There is no disk throttling. What can result into this?
>>>>>>
>>>>>> Thanks
>>>>>> Tarun
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: lucene index reader performance

Posted by Tarun Kumar <ta...@sumologic.com>.

Any suggestions pls?

On Mon, Jul 4, 2016 at 3:37 PM, Tarun Kumar <ta...@sumologic.com> wrote:

> Hey Michael,
>
> docIds from multiple indices (from multiple machines) need to be
> aggregated, sorted and first few thousand new to be queried. These few
> thousand docs can be distributed among multiple machines. Each machine will
> search the docs which are there in their own indices. So, pulling sorting
> on server side won't suffice the use-case. Is there a alternative to get
> document for given docIds faster?
>
> Thanks
> Tarun
>
> On Mon, Jul 4, 2016 at 3:17 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Why not ask Lucene to do the sort on your time field, instead of pulling
>> millions of docids to the client and having it sort.  You could even do
>> index-time sorting by time field if you want, which makes early termination
>> possible (faster sorted searches).
>>
>> But if even on having Lucene do the sort you still need to load millions
>> of documents per search request, you are in trouble: you need to
>> re-formulate your use case somehow to take advantage of what Lucene is good
>> for (getting top results for a search).
>>
>> Maybe you can use faceting to do whatever aggregation you are currently
>> doing after retrieving those millions of documents.
>>
>> Maybe you could make a custom collector, and use doc values, to do your
>> own custom aggregation.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Mon, Jul 4, 2016 at 1:39 AM, Tarun Kumar <ta...@sumologic.com> wrote:
>>
>>> Thanks for reply Michael! In my application, i need to get millions of
>>> documents per search.
>>>
>>> Use case is following: return documents in increasing order of field
>>> time. Client (caller) can't hold more than a few thousand docs at a time so
>>> it gets all docIds and corresponding time field for each doc, sort them on
>>> time and get n docs at a time. To support this usecase, i am:
>>>
>>> - getting all docsIds first.
>>> - Sort docIds on time fields.
>>> - Query n docids at a time from client which make
>>> indexReader.document(docId) call for all n docs at server, combine the docs
>>> these docs and return.
>>>
>>> indexReader.document(docId) is creating bottlenecks. What alternatives
>>> do you suggest?
>>>
>>> On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless <
>>> lucene@mikemccandless.com> wrote:
>>>
>>>> Are you maybe trying to load too many documents for each search request?
>>>>
>>>> The IR.document API is designed to be used to load just a few hits,
>>>> like a page worth or ~ 10 documents, per search.
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com>
>>>> wrote:
>>>>
>>>>> I am running lucene 4.6.1. I am trying to get documents corresponding
>>>>> to
>>>>> docIds. All threads get stuck (don't get stuck exactly but spend a LOT
>>>>> of
>>>>> time in) at:
>>>>>
>>>>> java.lang.Thread.State: RUNNABLE
>>>>>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>>>>         at
>>>>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>         at
>>>>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>>>>>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>>>>>         at
>>>>> org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>>>>>         at
>>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>>>>>         at
>>>>>
>>>>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>>>>>         at
>>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>>>>>
>>>>>
>>>>> There is no disk throttling. What can result into this?
>>>>>
>>>>> Thanks
>>>>> Tarun
>>>>>
>>>>
>>>>
>>>
>>
>

Re: lucene index reader performance

Posted by Tarun Kumar <ta...@sumologic.com>.

Hey Michael,

docIds from multiple indices (from multiple machines) need to be
aggregated, sorted and first few thousand new to be queried. These few
thousand docs can be distributed among multiple machines. Each machine will
search the docs which are there in their own indices. So, pulling sorting
on server side won't suffice the use-case. Is there a alternative to get
document for given docIds faster?

Thanks
Tarun

On Mon, Jul 4, 2016 at 3:17 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Why not ask Lucene to do the sort on your time field, instead of pulling
> millions of docids to the client and having it sort.  You could even do
> index-time sorting by time field if you want, which makes early termination
> possible (faster sorted searches).
>
> But if even on having Lucene do the sort you still need to load millions
> of documents per search request, you are in trouble: you need to
> re-formulate your use case somehow to take advantage of what Lucene is good
> for (getting top results for a search).
>
> Maybe you can use faceting to do whatever aggregation you are currently
> doing after retrieving those millions of documents.
>
> Maybe you could make a custom collector, and use doc values, to do your
> own custom aggregation.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Mon, Jul 4, 2016 at 1:39 AM, Tarun Kumar <ta...@sumologic.com> wrote:
>
>> Thanks for reply Michael! In my application, i need to get millions of
>> documents per search.
>>
>> Use case is following: return documents in increasing order of field
>> time. Client (caller) can't hold more than a few thousand docs at a time so
>> it gets all docIds and corresponding time field for each doc, sort them on
>> time and get n docs at a time. To support this usecase, i am:
>>
>> - getting all docsIds first.
>> - Sort docIds on time fields.
>> - Query n docids at a time from client which make
>> indexReader.document(docId) call for all n docs at server, combine the docs
>> these docs and return.
>>
>> indexReader.document(docId) is creating bottlenecks. What alternatives do
>> you suggest?
>>
>> On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless <
>> lucene@mikemccandless.com> wrote:
>>
>>> Are you maybe trying to load too many documents for each search request?
>>>
>>> The IR.document API is designed to be used to load just a few hits, like
>>> a page worth or ~ 10 documents, per search.
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com>
>>> wrote:
>>>
>>>> I am running lucene 4.6.1. I am trying to get documents corresponding to
>>>> docIds. All threads get stuck (don't get stuck exactly but spend a LOT
>>>> of
>>>> time in) at:
>>>>
>>>> java.lang.Thread.State: RUNNABLE
>>>>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>>>         at
>>>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>         at
>>>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>>>>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>>>>         at
>>>>
>>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>>>>         at
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>>>>         at
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>>>>         at
>>>> org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>>>>         at
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>>>>         at
>>>>
>>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>>>>         at
>>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>>>>         at
>>>>
>>>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>>>>         at
>>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>>>>
>>>>
>>>> There is no disk throttling. What can result into this?
>>>>
>>>> Thanks
>>>> Tarun
>>>>
>>>
>>>
>>
>

Re: lucene index reader performance

Posted by Michael McCandless <lu...@mikemccandless.com>.

Why not ask Lucene to do the sort on your time field, instead of pulling
millions of docids to the client and having it sort.  You could even do
index-time sorting by time field if you want, which makes early termination
possible (faster sorted searches).

But if even on having Lucene do the sort you still need to load millions of
documents per search request, you are in trouble: you need to re-formulate
your use case somehow to take advantage of what Lucene is good for (getting
top results for a search).

Maybe you can use faceting to do whatever aggregation you are currently
doing after retrieving those millions of documents.

Maybe you could make a custom collector, and use doc values, to do your own
custom aggregation.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Jul 4, 2016 at 1:39 AM, Tarun Kumar <ta...@sumologic.com> wrote:

> Thanks for reply Michael! In my application, i need to get millions of
> documents per search.
>
> Use case is following: return documents in increasing order of field time.
> Client (caller) can't hold more than a few thousand docs at a time so it
> gets all docIds and corresponding time field for each doc, sort them on
> time and get n docs at a time. To support this usecase, i am:
>
> - getting all docsIds first.
> - Sort docIds on time fields.
> - Query n docids at a time from client which make
> indexReader.document(docId) call for all n docs at server, combine the docs
> these docs and return.
>
> indexReader.document(docId) is creating bottlenecks. What alternatives do
> you suggest?
>
> On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Are you maybe trying to load too many documents for each search request?
>>
>> The IR.document API is designed to be used to load just a few hits, like
>> a page worth or ~ 10 documents, per search.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com> wrote:
>>
>>> I am running lucene 4.6.1. I am trying to get documents corresponding to
>>> docIds. All threads get stuck (don't get stuck exactly but spend a LOT of
>>> time in) at:
>>>
>>> java.lang.Thread.State: RUNNABLE
>>>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>>         at
>>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>         at
>>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>>>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>>>         at
>>>
>>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>>>         at
>>>
>>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>>>         at
>>>
>>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>>>         at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>>>         at
>>>
>>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>>>         at
>>>
>>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>>>         at
>>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>>>         at
>>>
>>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>>>         at
>>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>>>
>>>
>>> There is no disk throttling. What can result into this?
>>>
>>> Thanks
>>> Tarun
>>>
>>
>>
>

Re: lucene index reader performance

Posted by Tarun Kumar <ta...@sumologic.com>.

Thanks for reply Michael! In my application, i need to get millions of
documents per search.

Use case is following: return documents in increasing order of field time.
Client (caller) can't hold more than a few thousand docs at a time so it
gets all docIds and corresponding time field for each doc, sort them on
time and get n docs at a time. To support this usecase, i am:

- getting all docsIds first.
- Sort docIds on time fields.
- Query n docids at a time from client which make
indexReader.document(docId) call for all n docs at server, combine the docs
these docs and return.

indexReader.document(docId) is creating bottlenecks. What alternatives do
you suggest?

On Wed, Jun 29, 2016 at 4:00 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Are you maybe trying to load too many documents for each search request?
>
> The IR.document API is designed to be used to load just a few hits, like a
> page worth or ~ 10 documents, per search.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com> wrote:
>
>> I am running lucene 4.6.1. I am trying to get documents corresponding to
>> docIds. All threads get stuck (don't get stuck exactly but spend a LOT of
>> time in) at:
>>
>> java.lang.Thread.State: RUNNABLE
>>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>>         at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>         at
>> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>>         at
>>
>> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>>         at
>>
>> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>>         at
>>
>> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>>         at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>>         at
>>
>> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>>         at
>>
>> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>>         at
>> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>>         at
>>
>> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>>         at
>> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>>
>>
>> There is no disk throttling. What can result into this?
>>
>> Thanks
>> Tarun
>>
>
>

Re: lucene index reader performance

Posted by Michael McCandless <lu...@mikemccandless.com>.

Are you maybe trying to load too many documents for each search request?

The IR.document API is designed to be used to load just a few hits, like a
page worth or ~ 10 documents, per search.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2016 at 7:05 AM, Tarun Kumar <ta...@sumologic.com> wrote:

> I am running lucene 4.6.1. I am trying to get documents corresponding to
> docIds. All threads get stuck (don't get stuck exactly but spend a LOT of
> time in) at:
>
> java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>         at
> sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:731)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
>         at
>
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:169)
>         at
>
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:271)
>         at
>
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:51)
>         at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
>         at
>
> org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:218)
>         at
>
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:232)
>         at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:277)
>         at
>
> org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
>         at
> org.apache.lucene.index.IndexReader.document(IndexReader.java:440)
>
>
> There is no disk throttling. What can result into this?
>
> Thanks
> Tarun
>