You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Nathan Brackett <nb...@net-temps.com> on 2005/07/08 17:05:55 UTC

Search deadlocking under load

Hey all,

We're looking to use Lucene as the back end to our website and we're running
into an unusual deadlocking problem.

For testing purposes, we're just running one web server (threaded
environment) against an index mounted on an NFS share. This machine performs
searches only against this index so it's not being touched. We have tried a
few different models so far:

1) Pooling IndexSearcher objects: Occasionally we would run into OutOfMemory
problems as we would not block if a request came through and all
IndexSearchers were already checked out, we would just create a temporary
one and then dispose of it once it was returned to the pool.

2) Create a new IndexSearcher each time: Every request to search would
create an IndexSearcher object. This quickly gave OutOfMemory errors, even
when we would close them out directly after.

3) Use a global IndexSearcher: This is the model we're working with now. The
model holds up fine under low-moderate load and is, in fact, much faster at
searching (probably due to some caching mechanism). Under heavy load though,
the CPU will spike up to 99% and never come back down until we kill -9 the
process. Also, as you ramp the load, we've discovered that search times go
up as well. Searches will generally come back after 40ms, but as the load
goes up the searches don't come back for up to 20 seconds.

We've been attempting to find where the problem is for the last week with no
luck. Our index is optimized, so there is only one file. Do we need to
synchronize access to the global IndexSearcher so that only one search can
run at a time? That poses a bit of a problem as if a particular search takes
a long time, all others will wait. This problem does not look like an
OutOrMemory error because the memory usage when the spike occurs is usually
in the range of 150meg used with a ceiling of 650meg. Anyone else
experiencing any problems like this or have any idea where we should be
looking? Thanks.




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Paul Smith <ps...@aconex.com>.

On 13/07/2005, at 1:34 AM, Chris Hostetter wrote:

>
> : Since this isn't in production yet, I'd rather be proven wrong now
> : rather than later! :)
>
> it sounds like what you're doing makes a lot of sense given your
> situation, and the nature of your data.
>
> the one thing you might not have concidered yet, which doesn't have to
> make a big difference in your overall architecture, but might  
> influence
> the specifics of your design, is the idea that eventually you might  
> want
> to seperate Projects on onto different physical servers, letting  
> you put
> "important" projects on their own server, so they are alllways  
> available
> (even if they are the LRU).
>

Yes, thanks, initially we won't do this until we understand more  
about the profile of usage, and how the IndexSearchers are being aged  
out of the cache.  We have a mirror index server kept in sync, and  
plan to put Apache in front of them (as long as we can prove the 2  
parts of the mirror stay in sync, initially we'll just set apache to  
favor 1 server, with manual failover until we're completely sure).   
We have plans to be implemented eventually that include an Index  
partitioning such that not all projects sit on each server, and they  
broadcast what project contain to clients.

Paul

>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Chris Hostetter <ho...@fucit.org>.

: Since this isn't in production yet, I'd rather be proven wrong now
: rather than later! :)

it sounds like what you're doing makes a lot of sense given your
situation, and the nature of your data.

the one thing you might not have concidered yet, which doesn't have to
make a big difference in your overall architecture, but might influence
the specifics of your design, is the idea that eventually you might want
to seperate Projects on onto different physical servers, letting you put
"important" projects on their own server, so they are alllways available
(even if they are the LRU).



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Re[2]: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Paul Smith <ps...@aconex.com>.

Many thanks for confirming the principles should work fine.  It is a  
load off my mind! :)

On index update, a small Event is triggered into a Buffer, that is  
periodically (every 30 seconds) processed to coalesce them, then  
ensure that any open IndexSearcher in the cache is closed.

On 12/07/2005, at 4:00 AM, Otis Gospodnetic wrote:

> Paul - I'm doing the same (smaller indices) for Simpy.com for similar
> reasons (fast, independent and faster reindexing, etc.).  Each index
> has its own IndexSearcher, and they are kept in a LRU data structure.
> Before each search the index version is checked, and new IndexSearcher
> created in case the index changed.
>
> Otis
>
> --- Sven Duzont <sv...@keljob.com> wrote:
>
>
>> Hello,
>>
>> We are already using this design in production for a email job
>> application system.
>> Each client (company) have an account and may have multiple users
>> When a new client is created, a new lucene index is automatically
>> created when new job-applications arrive for this account.
>> Job applications are in principle owned by users, but some times they
>> can share it with other users in same account, so the search can be
>> user-independent.
>> This design works fine for us as the flow of job applications is not
>> the same for different accounts. There are lucene indices that are
>> more often updated than others.
>> It also permit us to rebuild one client index without impacting
>> others
>>
>> We have only one problem : when the index is updated and searched at
>> the same time, the index may be corrupted and an exception may be
>> thrown by the indexer ("Read past OEF", i unfortunately don't have
>> the stack trace right now under my hand). I think that it is because
>> the search and indexation are made in two different java processes.
>> We will rework the routines to lock the search when an indexation is
>> running and vice versa
>>
>> --- sven
>>
>> lundi 11 juillet 2005, 03:03:29, vous avez écrit:
>>
>>
>> PS> On 11/07/2005, at 10:43 AM, Chris Hostetter wrote:
>>
>>
>>>>
>>>> : > Generally speaking, you only ever need one active Searcher,
>>>>
>> which
>>
>>>> : > all of
>>>> : > your threads should be able to use.  (Of course, Nathan says
>>>>
>> that
>>
>>>> : > in his
>>>> : > code base, doing this causes his JVM to freeze up, but I've
>>>> never seen
>>>> : > this myself).
>>>> : >
>>>> : Thanks for your response Chris.  Do you think we are going down
>>>>
>> a
>>
>>>> : deadly path by having "many smaller" IndexSearchers open rather
>>>>
>> than
>>
>>>> : "one very large one"?
>>>>
>>>> I'm sorry ... i think i may have confused you, i forgot  that this
>>>> thread
>>>> was regarding partioning the index.  i ment one searcher *per
>>>> index* ...
>>>> don't try to make a seperate searcher per client, or have a pool
>>>>
>> of
>>
>>>> searchers, or anything like that.  But if you have a need to
>>>>
>> partition
>>
>>>> your data into multiple indexes, then have one searcher per index.
>>>>
>>
>> PS> Actually I think I confused you first, and then you confused me
>> PS> back... Let me... uhh, clarify 'ourselves'.. :)
>>
>> PS> My use of the word 'pool' was an error on my part (and a very
>> silly
>> PS> one).  I should really have meant "LRU Cache".
>>
>> PS> We have recognized that there is a finite # of IndexSearchers
>> that
>> PS> can probably be open at one time.  So we'll use an LRU cache to
>> make
>> PS> sure only the 'actively' in use Searchers are open.  However
>> there
>> PS> will only be one IndexSearcher for a given physical Index
>> directory
>> PS> open at a time, we're just making sure only the recently used
>> ones
>> PS> are kept open to keep memory limits sane.
>>
>>
>>>>
>>>> now assume you partition your data into two seperate indexes,
>>>> unless the
>>>> way you partition your data lets you cleanly so that each of hte
>>>> two indexes contains only half the number of terms as if you had
>>>> one big
>>>> index, then sorting on a field in those two indexes will require
>>>> more RAM
>>>> then sorting on the same data in asingle index.
>>>>
>>>>
>>
>> PS> Our data is logically segmented into Projects.  Each Project can
>>
>> PS> contain Documents and Mail.  So we currently have 2 physical
>> Indexes
>> PS> per Project.  90% of the time our users work within one project
>> at a
>> PS> time, and only work in "document mode" or "mail mode".  Every now
>> and
>> PS> then they may need to do a general search across all Entities
>> and/or
>> PS> Projects they are involved in (accomplished with Mulitsearcher).
>> PS> Perhaps we should just put Documents and Mail all in one Index
>> for a
>> PS> project (ie have 1 Index per project)??
>>
>> PS> Part of the reason in to partition is to make the cost of
>> rebuilding
>> PS> a given project cheaper.  Reduces the risk of an Uber-Index being
>> PS> corrupted and screwing all the users up.  We can order the
>> reindexing
>> PS> of projects to make sure our more important customers get
>> re-indexed
>> PS> first if there is a serious issue.
>>
>> PS> I would have thought that partitioning indexes would have
>> performance
>> PS> benefits too:  a lot less data to scan (most of the data is
>> already
>> PS> relevant).
>>
>> PS> Since this isn't in production yet, I'd rather be proven wrong
>> now
>> PS> rather than later! :)
>>
>> PS> Thanks for your input.
>>
>> PS> Paul
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Re[2]: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Paul - I'm doing the same (smaller indices) for Simpy.com for similar
reasons (fast, independent and faster reindexing, etc.).  Each index
has its own IndexSearcher, and they are kept in a LRU data structure. 
Before each search the index version is checked, and new IndexSearcher
created in case the index changed.

Otis

--- Sven Duzont <sv...@keljob.com> wrote:

> Hello, 
> 
> We are already using this design in production for a email job
> application system.
> Each client (company) have an account and may have multiple users
> When a new client is created, a new lucene index is automatically
> created when new job-applications arrive for this account. 
> Job applications are in principle owned by users, but some times they
> can share it with other users in same account, so the search can be
> user-independent.
> This design works fine for us as the flow of job applications is not
> the same for different accounts. There are lucene indices that are
> more often updated than others.
> It also permit us to rebuild one client index without impacting
> others
> 
> We have only one problem : when the index is updated and searched at
> the same time, the index may be corrupted and an exception may be
> thrown by the indexer ("Read past OEF", i unfortunately don't have
> the stack trace right now under my hand). I think that it is because
> the search and indexation are made in two different java processes.
> We will rework the routines to lock the search when an indexation is
> running and vice versa
> 
> --- sven
> 
> lundi 11 juillet 2005, 03:03:29, vous avez écrit:
> 
> 
> PS> On 11/07/2005, at 10:43 AM, Chris Hostetter wrote:
> 
> >>
> >> : > Generally speaking, you only ever need one active Searcher,
> which
> >> : > all of
> >> : > your threads should be able to use.  (Of course, Nathan says
> that
> >> : > in his
> >> : > code base, doing this causes his JVM to freeze up, but I've  
> >> never seen
> >> : > this myself).
> >> : >
> >> : Thanks for your response Chris.  Do you think we are going down
> a
> >> : deadly path by having "many smaller" IndexSearchers open rather
> than
> >> : "one very large one"?
> >>
> >> I'm sorry ... i think i may have confused you, i forgot  that this
> >> thread
> >> was regarding partioning the index.  i ment one searcher *per  
> >> index* ...
> >> don't try to make a seperate searcher per client, or have a pool
> of
> >> searchers, or anything like that.  But if you have a need to
> partition
> >> your data into multiple indexes, then have one searcher per index.
> 
> PS> Actually I think I confused you first, and then you confused me  
> PS> back... Let me... uhh, clarify 'ourselves'.. :)
> 
> PS> My use of the word 'pool' was an error on my part (and a very
> silly
> PS> one).  I should really have meant "LRU Cache".
> 
> PS> We have recognized that there is a finite # of IndexSearchers
> that
> PS> can probably be open at one time.  So we'll use an LRU cache to
> make
> PS> sure only the 'actively' in use Searchers are open.  However
> there
> PS> will only be one IndexSearcher for a given physical Index
> directory
> PS> open at a time, we're just making sure only the recently used
> ones
> PS> are kept open to keep memory limits sane.
> 
> >>
> >> now assume you partition your data into two seperate indexes,  
> >> unless the
> >> way you partition your data lets you cleanly so that each of hte
> >> two indexes contains only half the number of terms as if you had  
> >> one big
> >> index, then sorting on a field in those two indexes will require  
> >> more RAM
> >> then sorting on the same data in asingle index.
> >>
> 
> PS> Our data is logically segmented into Projects.  Each Project can 
> 
> PS> contain Documents and Mail.  So we currently have 2 physical
> Indexes
> PS> per Project.  90% of the time our users work within one project
> at a
> PS> time, and only work in "document mode" or "mail mode".  Every now
> and
> PS> then they may need to do a general search across all Entities
> and/or
> PS> Projects they are involved in (accomplished with Mulitsearcher).
> PS> Perhaps we should just put Documents and Mail all in one Index
> for a
> PS> project (ie have 1 Index per project)??
> 
> PS> Part of the reason in to partition is to make the cost of
> rebuilding
> PS> a given project cheaper.  Reduces the risk of an Uber-Index being
> PS> corrupted and screwing all the users up.  We can order the
> reindexing
> PS> of projects to make sure our more important customers get
> re-indexed
> PS> first if there is a serious issue.
> 
> PS> I would have thought that partitioning indexes would have
> performance
> PS> benefits too:  a lot less data to scan (most of the data is
> already
> PS> relevant).
> 
> PS> Since this isn't in production yet, I'd rather be proven wrong
> now
> PS> rather than later! :)
> 
> PS> Thanks for your input.
> 
> PS> Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re[2]: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Sven Duzont <sv...@keljob.com>.

Hello, 

We are already using this design in production for a email job application system.
Each client (company) have an account and may have multiple users
When a new client is created, a new lucene index is automatically created when new job-applications arrive for this account. 
Job applications are in principle owned by users, but some times they can share it with other users in same account, so the search can be user-independent.
This design works fine for us as the flow of job applications is not the same for different accounts. There are lucene indices that are more often updated than others.
It also permit us to rebuild one client index without impacting others

We have only one problem : when the index is updated and searched at the same time, the index may be corrupted and an exception may be thrown by the indexer ("Read past OEF", i unfortunately don't have the stack trace right now under my hand). I think that it is because the search and indexation are made in two different java processes. We will rework the routines to lock the search when an indexation is running and vice versa

--- sven

lundi 11 juillet 2005, 03:03:29, vous avez écrit:

PS> On 11/07/2005, at 10:43 AM, Chris Hostetter wrote:

>>
>> : > Generally speaking, you only ever need one active Searcher, which
>> : > all of
>> : > your threads should be able to use.  (Of course, Nathan says that
>> : > in his
>> : > code base, doing this causes his JVM to freeze up, but I've  
>> never seen
>> : > this myself).
>> : >
>> : Thanks for your response Chris.  Do you think we are going down a
>> : deadly path by having "many smaller" IndexSearchers open rather than
>> : "one very large one"?
>>
>> I'm sorry ... i think i may have confused you, i forgot  that this
>> thread
>> was regarding partioning the index.  i ment one searcher *per  
>> index* ...
>> don't try to make a seperate searcher per client, or have a pool of
>> searchers, or anything like that.  But if you have a need to partition
>> your data into multiple indexes, then have one searcher per index.

PS> Actually I think I confused you first, and then you confused me  
PS> back... Let me... uhh, clarify 'ourselves'.. :)

PS> My use of the word 'pool' was an error on my part (and a very silly
PS> one).  I should really have meant "LRU Cache".

PS> We have recognized that there is a finite # of IndexSearchers that
PS> can probably be open at one time.  So we'll use an LRU cache to make
PS> sure only the 'actively' in use Searchers are open.  However there
PS> will only be one IndexSearcher for a given physical Index directory
PS> open at a time, we're just making sure only the recently used ones
PS> are kept open to keep memory limits sane.

>>
>> now assume you partition your data into two seperate indexes,  
>> unless the
>> way you partition your data lets you cleanly so that each of hte
>> two indexes contains only half the number of terms as if you had  
>> one big
>> index, then sorting on a field in those two indexes will require  
>> more RAM
>> then sorting on the same data in asingle index.
>>

PS> Our data is logically segmented into Projects.  Each Project can  
PS> contain Documents and Mail.  So we currently have 2 physical Indexes
PS> per Project.  90% of the time our users work within one project at a
PS> time, and only work in "document mode" or "mail mode".  Every now and
PS> then they may need to do a general search across all Entities and/or
PS> Projects they are involved in (accomplished with Mulitsearcher).
PS> Perhaps we should just put Documents and Mail all in one Index for a
PS> project (ie have 1 Index per project)??

PS> Part of the reason in to partition is to make the cost of rebuilding
PS> a given project cheaper.  Reduces the risk of an Uber-Index being
PS> corrupted and screwing all the users up.  We can order the reindexing
PS> of projects to make sure our more important customers get re-indexed
PS> first if there is a serious issue.

PS> I would have thought that partitioning indexes would have performance
PS> benefits too:  a lot less data to scan (most of the data is already
PS> relevant).

PS> Since this isn't in production yet, I'd rather be proven wrong now
PS> rather than later! :)

PS> Thanks for your input.

PS> Paul

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Paul Smith <ps...@aconex.com>.

On 11/07/2005, at 10:43 AM, Chris Hostetter wrote:

>
> : > Generally speaking, you only ever need one active Searcher, which
> : > all of
> : > your threads should be able to use.  (Of course, Nathan says that
> : > in his
> : > code base, doing this causes his JVM to freeze up, but I've  
> never seen
> : > this myself).
> : >
> : Thanks for your response Chris.  Do you think we are going down a
> : deadly path by having "many smaller" IndexSearchers open rather than
> : "one very large one"?
>
> I'm sorry ... i think i may have confused you, i forgot  that this  
> thread
> was regarding partioning the index.  i ment one searcher *per  
> index* ...
> don't try to make a seperate searcher per client, or have a pool of
> searchers, or anything like that.  But if you have a need to partition
> your data into multiple indexes, then have one searcher per index.

Actually I think I confused you first, and then you confused me  
back... Let me... uhh, clarify 'ourselves'.. :)

My use of the word 'pool' was an error on my part (and a very silly  
one).  I should really have meant "LRU Cache".

We have recognized that there is a finite # of IndexSearchers that  
can probably be open at one time.  So we'll use an LRU cache to make  
sure only the 'actively' in use Searchers are open.  However there  
will only be one IndexSearcher for a given physical Index directory  
open at a time, we're just making sure only the recently used ones  
are kept open to keep memory limits sane.

>
> now assume you partition your data into two seperate indexes,  
> unless the
> way you partition your data lets you cleanly so that each of hte
> two indexes contains only half the number of terms as if you had  
> one big
> index, then sorting on a field in those two indexes will require  
> more RAM
> then sorting on the same data in asingle index.
>

Our data is logically segmented into Projects.  Each Project can  
contain Documents and Mail.  So we currently have 2 physical Indexes  
per Project.  90% of the time our users work within one project at a  
time, and only work in "document mode" or "mail mode".  Every now and  
then they may need to do a general search across all Entities and/or  
Projects they are involved in (accomplished with Mulitsearcher).   
Perhaps we should just put Documents and Mail all in one Index for a  
project (ie have 1 Index per project)??

Part of the reason in to partition is to make the cost of rebuilding  
a given project cheaper.  Reduces the risk of an Uber-Index being  
corrupted and screwing all the users up.  We can order the reindexing  
of projects to make sure our more important customers get re-indexed  
first if there is a serious issue.

I would have thought that partitioning indexes would have performance  
benefits too:  a lot less data to scan (most of the data is already  
relevant).

Since this isn't in production yet, I'd rather be proven wrong now  
rather than later! :)

Thanks for your input.

Paul

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Otis Gospodnetic <ot...@yahoo.com>.

If you want really real-time updates of search results, then yes. 
However, maybe you can live with near-real-time results, in which cases
you can add some logic to your application to check for index version
only every N requests/minutes/hours.

Otis


--- Aalap Parikh <al...@yahoo.com> wrote:

> >I don't really know a lot about what gets loaded into
> memory when you
> >make/use a new searcher, but the one thing i've
> learned from experience 
> >is
> >that the FieldCache (which gets used when you sort on
> a field) contains
> >every term in the field you are sorting on, and an
> instance of 
> >FieldCache
> >exists for each IndexReader you open (which is one
> big reason not to 
> >open
> >a seperate reader for every client).
> 
> You mentioned that re-using the same IndexSearcher
> would provide better performance in terms of sorting
> of search results, but what if the index I am
> searching on is constantly being updated? Would using
> the same Searcher pick up those updates (adds/removes)
> or that I would need to instantiate a new Searcher for
> each of the client requests in order for the search
> results to reflect the updated or new docs in the
> index?
> 
> Thanks,
> Aalap.
> 
> --- Chris Hostetter <ho...@fucit.org> wrote:
> 
> > 
> > : > Generally speaking, you only ever need one
> > active Searcher, which
> > : > all of
> > : > your threads should be able to use.  (Of course,
> > Nathan says that
> > : > in his
> > : > code base, doing this causes his JVM to freeze
> > up, but I've never seen
> > : > this myself).
> > : >
> > : Thanks for your response Chris.  Do you think we
> > are going down a
> > : deadly path by having "many smaller"
> > IndexSearchers open rather than
> > : "one very large one"?
> > 
> > I'm sorry ... i think i may have confused you, i
> > forgot  that this thread
> > was regarding partioning the index.  i ment one
> > searcher *per index* ...
> > don't try to make a seperate searcher per client, or
> > have a pool of
> > searchers, or anything like that.  But if you have a
> > need to partition
> > your data into multiple indexes, then have one
> > searcher per index.
> > 
> > I don't really know a lot about what gets loaded
> > into memory when you
> > make/use a new searcher, but the one thing i've
> > learned from experience is
> > that the FieldCache (which gets used when you sort
> > on a field) contains
> > every term in the field you are sorting on, and an
> > instance of FieldCache
> > exists for each IndexReader you open (which is one
> > big reason not to open
> > a seperate reader for every client).
> > 
> > now assume you partition your data into two seperate
> > indexes, unless the
> > way you partition your data lets you cleanly so that
> > each of hte
> > two indexes contains only half the number of terms
> > as if you had one big
> > index, then sorting on a field in those two indexes
> > will require more RAM
> > then sorting on the same data in asingle index.
> > 
> > ...that's just one example i ran into when looking
> > at partitioning data,
> > i'm sure there are other cases where splitting your
> > data up into seperate
> > indexes isn't neccessarily more efficient then using
> > one big index.
> > 
> > 
> > 
> > -Hoss
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Aalap Parikh <al...@yahoo.com>.

>I don't really know a lot about what gets loaded into
memory when you
>make/use a new searcher, but the one thing i've
learned from experience 
>is
>that the FieldCache (which gets used when you sort on
a field) contains
>every term in the field you are sorting on, and an
instance of 
>FieldCache
>exists for each IndexReader you open (which is one
big reason not to 
>open
>a seperate reader for every client).

You mentioned that re-using the same IndexSearcher
would provide better performance in terms of sorting
of search results, but what if the index I am
searching on is constantly being updated? Would using
the same Searcher pick up those updates (adds/removes)
or that I would need to instantiate a new Searcher for
each of the client requests in order for the search
results to reflect the updated or new docs in the
index?

Thanks,
Aalap.

--- Chris Hostetter <ho...@fucit.org> wrote:

> 
> : > Generally speaking, you only ever need one
> active Searcher, which
> : > all of
> : > your threads should be able to use.  (Of course,
> Nathan says that
> : > in his
> : > code base, doing this causes his JVM to freeze
> up, but I've never seen
> : > this myself).
> : >
> : Thanks for your response Chris.  Do you think we
> are going down a
> : deadly path by having "many smaller"
> IndexSearchers open rather than
> : "one very large one"?
> 
> I'm sorry ... i think i may have confused you, i
> forgot  that this thread
> was regarding partioning the index.  i ment one
> searcher *per index* ...
> don't try to make a seperate searcher per client, or
> have a pool of
> searchers, or anything like that.  But if you have a
> need to partition
> your data into multiple indexes, then have one
> searcher per index.
> 
> I don't really know a lot about what gets loaded
> into memory when you
> make/use a new searcher, but the one thing i've
> learned from experience is
> that the FieldCache (which gets used when you sort
> on a field) contains
> every term in the field you are sorting on, and an
> instance of FieldCache
> exists for each IndexReader you open (which is one
> big reason not to open
> a seperate reader for every client).
> 
> now assume you partition your data into two seperate
> indexes, unless the
> way you partition your data lets you cleanly so that
> each of hte
> two indexes contains only half the number of terms
> as if you had one big
> index, then sorting on a field in those two indexes
> will require more RAM
> then sorting on the same data in asingle index.
> 
> ...that's just one example i ran into when looking
> at partitioning data,
> i'm sure there are other cases where splitting your
> data up into seperate
> indexes isn't neccessarily more efficient then using
> one big index.
> 
> 
> 
> -Hoss
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Chris Hostetter <ho...@fucit.org>.

: > Generally speaking, you only ever need one active Searcher, which
: > all of
: > your threads should be able to use.  (Of course, Nathan says that
: > in his
: > code base, doing this causes his JVM to freeze up, but I've never seen
: > this myself).
: >
: Thanks for your response Chris.  Do you think we are going down a
: deadly path by having "many smaller" IndexSearchers open rather than
: "one very large one"?

I'm sorry ... i think i may have confused you, i forgot  that this thread
was regarding partioning the index.  i ment one searcher *per index* ...
don't try to make a seperate searcher per client, or have a pool of
searchers, or anything like that.  But if you have a need to partition
your data into multiple indexes, then have one searcher per index.

I don't really know a lot about what gets loaded into memory when you
make/use a new searcher, but the one thing i've learned from experience is
that the FieldCache (which gets used when you sort on a field) contains
every term in the field you are sorting on, and an instance of FieldCache
exists for each IndexReader you open (which is one big reason not to open
a seperate reader for every client).

now assume you partition your data into two seperate indexes, unless the
way you partition your data lets you cleanly so that each of hte
two indexes contains only half the number of terms as if you had one big
index, then sorting on a field in those two indexes will require more RAM
then sorting on the same data in asingle index.

...that's just one example i ran into when looking at partitioning data,
i'm sure there are other cases where splitting your data up into seperate
indexes isn't neccessarily more efficient then using one big index.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Paul Smith <ps...@aconex.com>.

On 11/07/2005, at 9:15 AM, Chris Hostetter wrote:

>
> : Nathan's point about pooling Searchers is something that we also
> : addressed by a LRU cache mechanism.  In testing we also found that
>
> Generally speaking, you only ever need one active Searcher, which  
> all of
> your threads should be able to use.  (Of course, Nathan says that  
> in his
> code base, doing this causes his JVM to freeze up, but I've never seen
> this myself).
>
Thanks for your response Chris.  Do you think we are going down a  
deadly path by having "many smaller" IndexSearchers open rather than  
"one very large one"?

>
> As I understand it, the general rule is: if you call  
> IndexReader.open, you
> better call .close() on that reader.  If you construct and  
> IndexSearcher
> using a Directory or a path, then calling .close() on the searcher  
> will
> take care of closing the reader -- but if your code look like this...
>
>     Searcher s = new IndexSearcher(IndexReader.open(foo))
>
> ...then you are screwed, because nothing will ever close that  
> reader and
> free it's resources.

That was my initial thought when Nathan outlined is issue.  I've seen  
that happen before myself.

Paul

Re: Index Partitioning ( was Re: Search deadlocking under load)

Posted by Chris Hostetter <ho...@fucit.org>.

: Nathan's point about pooling Searchers is something that we also
: addressed by a LRU cache mechanism.  In testing we also found that

Generally speaking, you only ever need one active Searcher, which all of
your threads should be able to use.  (Of course, Nathan says that in his
code base, doing this causes his JVM to freeze up, but I've never seen
this myself).

I say one "active" Searcher because it might make sense in your
application to open a new searcher after new documents have been added, do
some searches on that new Searcher to "warm" FieldCache for ssorting, and
then close the old searcher and make the new Searcher available to all of
your query threads.

: However his 2nd point is interesting that creating a new index each
: time eventually suffered OutOfMemory (even though he's closing them)
: is a worry.   Is this because an IndexSearcher can be closed, but the
: underlying IndexReader is not automatically closed?

As I understand it, the general rule is: if you call IndexReader.open, you
better call .close() on that reader.  If you construct and IndexSearcher
using a Directory or a path, then calling .close() on the searcher will
take care of closing the reader -- but if your code look like this...

    Searcher s = new IndexSearcher(IndexReader.open(foo))

...then you are screwed, because nothing will ever close that reader and
free it's resources.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Index Partitioning ( was Re: Search deadlocking under load)

Posted by Paul Smith <ps...@aconex.com>.

Nathan, first apologies for somewhat hijacking your thread, but I  
believe my question to be very related.

Nathan's Scenario 1 is the one we're effectively employing (or in the  
process of setting up).  Rather than 1 Index To Rule Them All, I have  
decided to partition the index structure.  Users tend to focus on a  
Project concept at a time, and within each Project, they have  
Documents and Mail (and some other types we'll eventually index, we  
call them 'entities' to be generic).  So I am creating an Index for  
each Project-Entity.  We should still be able to search across all  
entities  for a given project (or even for all) by using  
MultiSearcher.  However I believed it would be faster to have  
separate indices (much smaller index to search).

Otis (and anyone else), are you suggesting this design is not  
something we should employ?

Nathan's point about pooling Searchers is something that we also  
addressed by a LRU cache mechanism.  In testing we also found that  
there was an upper limit on the number of IndexSearchers that can be  
open at one time, and so I can see why he suffered OOM with creating  
temporary searchers for those requests outside the current pool-set.   
However his 2nd point is interesting that creating a new index each  
time eventually suffered OutOfMemory (even though he's closing them)  
is a worry.   Is this because an IndexSearcher can be closed, but the  
underlying IndexReader is not automatically closed?

Appreciate any thoughts on this.  I'd rather know now while I have  
the opportunity to change the design than later when in production..  :)

cheers,

Paul Smith

On 09/07/2005, at 5:39 AM, Otis Gospodnetic wrote:

> Nathan,
>
> 3) is the recommended usage.
> Your index is on an NFS share, which means you are searching it over
> the network.  Make it local, and you should see performance
> improvements.  Local or remove, it makes sense that searches take
> longer to execute, and the load goes up.  Yes, it shouldn't deadlock.
> You shouldn't need to synchronize access to IndexSearcher.
> When your JVM locks up next time, kill it, get the thread dump, and
> send it to the list, so we can try to remove the bottleneck, if that's
> possible.
>
> How many queries/second do you run, and what kinds of queries are  
> they,
> how big is your index and what kind of hardware (disks, RAM, CPU) are
> you using?
>
> Otis
>
> --- Nathan Brackett <nb...@net-temps.com> wrote:
>
>
>> Hey all,
>>
>> We're looking to use Lucene as the back end to our website and we're
>> running
>> into an unusual deadlocking problem.
>>
>> For testing purposes, we're just running one web server (threaded
>> environment) against an index mounted on an NFS share. This machine
>> performs
>> searches only against this index so it's not being touched. We have
>> tried a
>> few different models so far:
>>
>> 1) Pooling IndexSearcher objects: Occasionally we would run into
>> OutOfMemory
>> problems as we would not block if a request came through and all
>> IndexSearchers were already checked out, we would just create a
>> temporary
>> one and then dispose of it once it was returned to the pool.
>>
>> 2) Create a new IndexSearcher each time: Every request to search
>> would
>> create an IndexSearcher object. This quickly gave OutOfMemory errors,
>> even
>> when we would close them out directly after.
>>
>> 3) Use a global IndexSearcher: This is the model we're working with
>> now. The
>> model holds up fine under low-moderate load and is, in fact, much
>> faster at
>> searching (probably due to some caching mechanism). Under heavy load
>> though,
>> the CPU will spike up to 99% and never come back down until we kill
>> -9 the
>> process. Also, as you ramp the load, we've discovered that search
>> times go
>> up as well. Searches will generally come back after 40ms, but as the
>> load
>> goes up the searches don't come back for up to 20 seconds.
>>
>> We've been attempting to find where the problem is for the last week
>> with no
>> luck. Our index is optimized, so there is only one file. Do we need
>> to
>> synchronize access to the global IndexSearcher so that only one
>> search can
>> run at a time? That poses a bit of a problem as if a particular
>> search takes
>> a long time, all others will wait. This problem does not look like an
>> OutOrMemory error because the memory usage when the spike occurs is
>> usually
>> in the range of 150meg used with a ceiling of 650meg. Anyone else
>> experiencing any problems like this or have any idea where we should
>> be
>> looking? Thanks.
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Search deadlocking under load

Posted by Robert Engels <re...@ix.netcom.com>.

I had posted an NioFile and caching system that greatly increases the
parallelness of Lucene. Although on some platforms (Windows), the low-level
NioChannel is not completely thread-safe so it can still block, although the
code has some work-arounds for this problem.

You can never achieve "100% parallel", as a thread will block doing disk io
at some point in the driver (unless everything is in the disk cache), but
even without this case, unless you have the same number of processors as
threads, there will always be a "blocking/waiting".

If the time to perform a search is greater than the time needed for a
certain # of requests per second, you will always generate more threads,
unless you limit the number of threads accepting requests in the first
place.

R



-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Wednesday, July 13, 2005 5:53 PM
To: java-dev@lucene.apache.org
Cc: Nathan Brackett
Subject: RE: Search deadlocking under load


This may be better for java-dev@...

I've looked at the source of that method, but I don't see a way of
removing that synchronized block.  Maybe somebody else has ideas, but
it looks like the synchronization is there to ensure the file being
read is read in fully, without some other thread modifying it "under
the reader's feet".

Otis

--- Nathan Brackett <nb...@net-temps.com> wrote:

> Otis,
>
> After further testing it turns out that the 'deadlock' we're
> encountering is
> not a deadlock at all, but a result of resin hitting its maximum
> number of
> allowed threads.  We bumped up the max-threads in the config and it
> fixed
> the problem for a certain amount of load, but we'd much prefer to go
> after
> the source of the problem, namely:
>
> As the number of threads hitting lucene increases, contention for
> locks
> increases, meaning the average response time decreases.  This places
> us in a
> downward spiral of performance because as the incoming number of hits
> per
> second stays constant, the response time decreases, meaning that the
> total
> number of threads inside resin doing work will increase.  This
> problem
> compounds itself, escalating the number of threads in resin until we
> crash.
>
>
> Admittedly this is a pretty harsh test (~~20 hits per second
> triggering
> complex searches, which starts fine but then escalates to > 150
> threads as
> processing slows down but number of incoming hits per second does
> not)
>
> Our ultimate goal, however, is to have each search be completely and
> 100%
> parallel.
>
> The point of contention seems to be the method below:
>
> FSDirectory.java:486 (class FSInputStream)
>
>
>
>   protected final void readInternal(byte[] b, int offset, int len)
>   		throws IOException {
>   	synchronized (file) {
>   		long position = getFilePointer();
>   		if (position != file.position) {
>   			file.seek(position);
>   			file.position = position;
>   		}
>   		int total = 0;
>   		do {
>   			int i = file.read(b, offset+total, len-total);
>   			if (i == -1)
>   				throw new IOException("read past EOF");
>   			file.position += i;
>   			total += i;
>   		} while (total < len);
>   	}
>   }
>
>
>
>
> The threads are usually all lined up to reach this.  Why are so many
> threads
> backed up behind the same instance of FSInputStream.readInternal?
> Shouldn't
> each search have a different input stream?  What would you suggest as
> the
> best path to achieve 100% parallel searching?  Here's a sample of our
> thread
> dump, you can see 2 threads waiting for the same
> FSInputStream$Descriptor
> (which is the synchronized(file) above):
>
> "tcpConnection-8080-11" daemon prio=5 tid=0x08304600 nid=0x8304800
> waiting
> for monitor entry [bf494000..bf494d08]
>         at
>
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:412)
>         - waiting to lock <0x2f2b7a38> (a
> org.apache.lucene.store.FSInputStream$Descriptor)
>         at
> org.apache.lucene.store.InputStream.refill(InputStream.java:158)
>         at
> org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
>         at
> org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
>         at
>
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
>         at
> org.apache.lucene.search.TermScorer.next(TermScorer.java:55)
>         at
> org.apache.lucene.search.BooleanScorer.next(BooleanScorer.java:112)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:37)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
>         at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
>         at org.apache.lucene.search.Hits.<init>(Hits.java:43)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:33)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:27)
>         at
>
com.nettemps.search.backend.SingleIndexManager.search(SingleIndexManager.jav
> a:335)
>         at
>
com.nettemps.search.backend.IndexAccessControl.doSearch(IndexAccessControl.j
> ava:100)
>
> "tcpConnection-8080-10" daemon prio=5 tid=0x08336800 nid=0x8336a00
> waiting
> for monitor entry [bf4d5000..bf4d5d08]
>         at
>
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:412)
>         - waiting to lock <0x2f2b7a38> (a
> org.apache.lucene.store.FSInputStream$Descriptor)
>         at
> org.apache.lucene.store.InputStream.refill(InputStream.java:158)
>         at
> org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
>         at
> org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
>         at
>
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
>         at
> org.apache.lucene.search.TermScorer.next(TermScorer.java:55)
>         at
> org.apache.lucene.search.BooleanScorer.next(BooleanScorer.java:112)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:37)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
>         at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
>         at org.apache.lucene.search.Hits.<init>(Hits.java:43)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:33)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:27)
>         at
>
com.nettemps.search.backend.SingleIndexManager.search(SingleIndexManager.jav
> a:335)
>
> -----Original Message-----
> From: Nathan Brackett [mailto:nbrackett@net-temps.com]
> Sent: Monday, July 11, 2005 5:43 PM
> To: java-user@lucene.apache.org
> Subject: RE: Search deadlocking under load
>
>
> Thanks for the advice. That ought to reduce contention a bit in that
> particular method.
>
> I've been reviewing a large amount of thread dumps today and I was
> wondering
> if it's common to see many threads that look like this:
>
> "tcpConnection-8080-20" daemon prio=5 tid=0x081ba000 nid=0x810ac00
> waiting
> for monitor entry [bf24b000..bf24bd20]
>         at
>
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
> ndFileReader.java:216)
>         - waiting to lock <0x2ee24c48> (a
> org.apache.lucene.store.FSInputStream)
>
> When I get the deadlock situation, I often see a few of these lying
> around,
> but no matching thread that actually has the lock on 0x2ee24c48 in
> the dump.
> Is this normal? Not really a thread dump pro.
>
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Monday, July 11, 2005 1:57 PM
> To: java-user@lucene.apache.org
> Subject: RE: Search deadlocking under load
>
>
> Hi Nick,
>
> Without looking at the source of that method, I'd suggest first
> trying
> the multifile index format (you can easily convert to it by setting
> the
> new format on IndexWriter and optimizing it).  I'd be interested to
> know if this eliminates the problem, or at least makes it harder to
> hit.
>
> Otis
>
>
> --- Nathan Brackett <nb...@net-temps.com> wrote:
>
> > Hey Otis,
> >
> > Thanks for the hasty response and apologies for my delayed
> response.
> > It was
> > Friday and time to go :)
> >
> > The queries we're running are very varied (wildcard, phrase,
> normal).
> > The
> > index is only about a 1/2 gig in size (maybe 250,000 documents).
> The
> > machine
> > is running FreeBSD 5.3 with ~2 gig RAM.
> >
> > I got a thread dump from right around the time that the process
> would
> > deadlock and not come back and I noticed that almost all of the
> > threads were
> > waiting on the same method. Here's what the trace looks like:
> (small
> > sample
> > for the sake of brevity...the real dump is huge)
> >
> > tcpConnection-8080-32:
> >   [1]
> >
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal
> > (CompoundFileReader.java:217)
> >   [2] org.apache.lucene.store.InputStream.refill
> > (InputStream.java:158)
> >   [3] org.apache.lucene.store.InputStream.readByte
> > (InputStream.java:43)
> >   [4] org.apache.lucene.store.InputStream.readVInt
> > (InputStream.java:83)
> >   [5] org.apache.lucene.index.SegmentTermDocs.read
> > (SegmentTermDocs.java:126)
> >   [6] org.apache.lucene.search.TermScorer.next (TermScorer.java:55)
> >   [7] org.apache.lucene.search.BooleanScorer.next
> > (BooleanScorer.java:112)
> >   [8] org.apache.lucene.search.Scorer.score (Scorer.java:37)
> >   [9] org.apache.lucene.search.IndexSearcher.search
> > (IndexSearcher.java:92)
> >   [10] org.apache.lucene.search.Hits.getMoreDocs (Hits.java:64)
> >   [11] org.apache.lucene.search.Hits.<init> (Hits.java:43)
> >   [12] org.apache.lucene.search.Searcher.search (Searcher.java:33)
> >   [13] org.apache.lucene.search.Searcher.search (Searcher.java:27)
> >   [14] com.nettemps.search.backend.SingleIndexManager.search
> > (SingleIndexManager.java:335)
> >   [15] com.nettemps.search.backend.IndexAccessControl.doSearch
> > (IndexAccessControl.java:100)
> >   [16] com.nettemps.search.server.SearchServerImpl.searchResumes
> > (SearchServerImpl.java:402)
> >   [17]
> >
>
com.nettemps.search.server.SearchServerReadOnly_Tie.invoke_searchResumes
> > (SearchServerReadOnly_Tie.java:93)
> >   [18]
> > com.nettemps.search.server.SearchServerReadOnly_Tie.processingHook
> > (SearchServerReadOnly_Tie.java:298)
> >   [19] com.sun.xml.rpc.server.StreamingHandler.handle
> > (StreamingHandler.java:321)
> >   [20] com.sun.xml.rpc.server.http.JAXRPCServletDelegate.doPost
> > (JAXRPCServletDelegate.java:443)
> >   [21] com.sun.xml.rpc.server.http.JAXRPCServlet.doPost
> > (JAXRPCServlet.java:102)
> >   [22] javax.servlet.http.HttpServlet.service
> (HttpServlet.java:165)
> >   [23] javax.servlet.http.HttpServlet.service
> (HttpServlet.java:103)
> >   [24] com.caucho.server.http.FilterChainServlet.doFilter
> > (FilterChainServlet.java:96)
> >   [25] com.caucho.server.http.Invocation.service
> > (Invocation.java:315)
> >   [26] com.caucho.server.http.CacheInvocation.service
> > (CacheInvocation.java:135)
> >   [27] com.caucho.server.http.HttpRequest.handleRequest
> > (HttpRequest.java:253)
> >   [28] com.caucho.server.http.HttpRequest.handleConnection
> > (HttpRequest.java:170)
> >   [29] com.caucho.server.TcpConnection.run (TcpConnection.java:139)
> >   [30] java.lang.Thread.run (Thread.java:534)
> >
> > I took a look at that readInternal method and saw that the
> contention
> > is
> > around an InputStream that I assume reads from the actual index
> file
> > and
> > returns it for use by the method. We are running many threads that
> > are
> > attempting to do searches at the same time (roughly 30 - 35), so
> that
> > explains why the search times would go up.
> >
> > In an attempt to reduce the amount of contention, we synchronized
> our
> > search
> > method (the one that makes the actual call to Lucene's search: [14]
> > com.nettemps.search.backend.SingleIndexManager.search
> > (SingleIndexManager.java:335)). This also caused the same results
> > when hit
> > with too many threads.
> >
> > We're really stuck at this point as to what to try. Any advice?
> >
> >
> >
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Friday, July 08, 2005 3:40 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Search deadlocking under load
> >
> >
> > Nathan,
> >
> > 3) is the recommended usage.
> > Your index is on an NFS share, which means you are searching it
> over
> > the network.  Make it local, and you should see performance
> > improvements.  Local or remove, it makes sense that searches take
> > longer to execute, and the load goes up.  Yes, it shouldn't
> deadlock.
> > You shouldn't need to synchronize access to IndexSearcher.
> > When your JVM locks up next time, kill it, get the thread dump, and
> > send it to the list, so we can try to remove the bottleneck, if
> > that's
> > possible.
> >
> > How many queries/second do you run, and what kinds of queries are
> > they,
> > how big is your index and what kind of hardware (disks, RAM, CPU)
> are
> > you using?
> >
> > Otis
> >
> > --- Nathan Brackett <nb...@net-temps.com> wrote:
> >
> > > Hey all,
> > >
> > > We're looking to use Lucene as the back end to our website and
> > we're
> > > running
> > > into an unusual deadlocking problem.
> > >
> > > For testing purposes, we're just running one web server (threaded
> > > environment) against an index mounted on an NFS share. This
> machine
> > > performs
> > > searches only against this index so it's not being touched. We
> have
> > > tried a
> > > few different models so far:
> > >
> > > 1) Pooling IndexSearcher objects: Occasionally we would run into
> > > OutOfMemory
> > > problems as we would not block if a request came through and all
> > > IndexSearchers were already checked out, we would just create a
> > > temporary
> > > one and then dispose of it once it was returned to the pool.
> > >
> > > 2) Create a new IndexSearcher each time: Every request to search
> > > would
> > > create an IndexSearcher object. This quickly gave OutOfMemory
> > errors,
> > > even
> > > when we would close them out directly after.
> > >
> > > 3) Use a global IndexSearcher: This is the model we're working
> with
> > > now. The
> > > model holds up fine under low-moderate load and is, in fact, much
> > > faster at
> > > searching (probably due to some caching mechanism). Under heavy
> > load
> > > though,
> > > the CPU will spike up to 99% and never come back down until we
> kill
> > > -9 the
> > > process. Also, as you ramp the load, we've discovered that search
> > > times go
> > > up as well. Searches will generally come back after 40ms, but as
> > the
> > > load
> > > goes up the searches don't come back for up to 20 seconds.
> > >
> > > We've been attempting to find where the problem is for the last
> > week
> > > with no
> > > luck. Our index is optimized, so there is only one file. Do we
> need
> > > to
> > > synchronize access to the global IndexSearcher so that only one
> > > search can
> > > run at a time? That poses a bit of a problem as if a particular
> > > search takes
> > > a long time, all others will wait. This problem does not look
> like
> > an
> > > OutOrMemory error because the memory usage when the spike occurs
> is
> > > usually
> > > in the range of 150meg used with a ceiling of 650meg. Anyone else
> > > experiencing any problems like this or have any idea where we
> > should
> > > be
> > > looking? Thanks.
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Search deadlocking under load

Posted by Otis Gospodnetic <ot...@yahoo.com>.

This may be better for java-dev@...

I've looked at the source of that method, but I don't see a way of
removing that synchronized block.  Maybe somebody else has ideas, but
it looks like the synchronization is there to ensure the file being
read is read in fully, without some other thread modifying it "under
the reader's feet".

Otis

--- Nathan Brackett <nb...@net-temps.com> wrote:

> Otis,
> 
> After further testing it turns out that the 'deadlock' we're
> encountering is
> not a deadlock at all, but a result of resin hitting its maximum
> number of
> allowed threads.  We bumped up the max-threads in the config and it
> fixed
> the problem for a certain amount of load, but we'd much prefer to go
> after
> the source of the problem, namely:
> 
> As the number of threads hitting lucene increases, contention for
> locks
> increases, meaning the average response time decreases.  This places
> us in a
> downward spiral of performance because as the incoming number of hits
> per
> second stays constant, the response time decreases, meaning that the
> total
> number of threads inside resin doing work will increase.  This
> problem
> compounds itself, escalating the number of threads in resin until we
> crash.
> 
> 
> Admittedly this is a pretty harsh test (~~20 hits per second
> triggering
> complex searches, which starts fine but then escalates to > 150
> threads as
> processing slows down but number of incoming hits per second does
> not)
> 
> Our ultimate goal, however, is to have each search be completely and
> 100%
> parallel.
> 
> The point of contention seems to be the method below:
> 
> FSDirectory.java:486 (class FSInputStream)
> 
> 
> 
>   protected final void readInternal(byte[] b, int offset, int len)
>   		throws IOException {
>   	synchronized (file) {
>   		long position = getFilePointer();
>   		if (position != file.position) {
>   			file.seek(position);
>   			file.position = position;
>   		}
>   		int total = 0;
>   		do {
>   			int i = file.read(b, offset+total, len-total);
>   			if (i == -1)
>   				throw new IOException("read past EOF");
>   			file.position += i;
>   			total += i;
>   		} while (total < len);
>   	}
>   }
> 
> 
> 
> 
> The threads are usually all lined up to reach this.  Why are so many
> threads
> backed up behind the same instance of FSInputStream.readInternal? 
> Shouldn't
> each search have a different input stream?  What would you suggest as
> the
> best path to achieve 100% parallel searching?  Here's a sample of our
> thread
> dump, you can see 2 threads waiting for the same
> FSInputStream$Descriptor
> (which is the synchronized(file) above):
> 
> "tcpConnection-8080-11" daemon prio=5 tid=0x08304600 nid=0x8304800
> waiting
> for monitor entry [bf494000..bf494d08]
>         at
>
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:412)
>         - waiting to lock <0x2f2b7a38> (a
> org.apache.lucene.store.FSInputStream$Descriptor)
>         at
> org.apache.lucene.store.InputStream.refill(InputStream.java:158)
>         at
> org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
>         at
> org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
>         at
>
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
>         at
> org.apache.lucene.search.TermScorer.next(TermScorer.java:55)
>         at
> org.apache.lucene.search.BooleanScorer.next(BooleanScorer.java:112)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:37)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
>         at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
>         at org.apache.lucene.search.Hits.<init>(Hits.java:43)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:33)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:27)
>         at
>
com.nettemps.search.backend.SingleIndexManager.search(SingleIndexManager.jav
> a:335)
>         at
>
com.nettemps.search.backend.IndexAccessControl.doSearch(IndexAccessControl.j
> ava:100)
> 
> "tcpConnection-8080-10" daemon prio=5 tid=0x08336800 nid=0x8336a00
> waiting
> for monitor entry [bf4d5000..bf4d5d08]
>         at
>
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:412)
>         - waiting to lock <0x2f2b7a38> (a
> org.apache.lucene.store.FSInputStream$Descriptor)
>         at
> org.apache.lucene.store.InputStream.refill(InputStream.java:158)
>         at
> org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
>         at
> org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
>         at
>
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
>         at
> org.apache.lucene.search.TermScorer.next(TermScorer.java:55)
>         at
> org.apache.lucene.search.BooleanScorer.next(BooleanScorer.java:112)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:37)
>         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
>         at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
>         at org.apache.lucene.search.Hits.<init>(Hits.java:43)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:33)
>         at org.apache.lucene.search.Searcher.search(Searcher.java:27)
>         at
>
com.nettemps.search.backend.SingleIndexManager.search(SingleIndexManager.jav
> a:335)
> 
> -----Original Message-----
> From: Nathan Brackett [mailto:nbrackett@net-temps.com]
> Sent: Monday, July 11, 2005 5:43 PM
> To: java-user@lucene.apache.org
> Subject: RE: Search deadlocking under load
> 
> 
> Thanks for the advice. That ought to reduce contention a bit in that
> particular method.
> 
> I've been reviewing a large amount of thread dumps today and I was
> wondering
> if it's common to see many threads that look like this:
> 
> "tcpConnection-8080-20" daemon prio=5 tid=0x081ba000 nid=0x810ac00
> waiting
> for monitor entry [bf24b000..bf24bd20]
>         at
>
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
> ndFileReader.java:216)
>         - waiting to lock <0x2ee24c48> (a
> org.apache.lucene.store.FSInputStream)
> 
> When I get the deadlock situation, I often see a few of these lying
> around,
> but no matching thread that actually has the lock on 0x2ee24c48 in
> the dump.
> Is this normal? Not really a thread dump pro.
> 
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Monday, July 11, 2005 1:57 PM
> To: java-user@lucene.apache.org
> Subject: RE: Search deadlocking under load
> 
> 
> Hi Nick,
> 
> Without looking at the source of that method, I'd suggest first
> trying
> the multifile index format (you can easily convert to it by setting
> the
> new format on IndexWriter and optimizing it).  I'd be interested to
> know if this eliminates the problem, or at least makes it harder to
> hit.
> 
> Otis
> 
> 
> --- Nathan Brackett <nb...@net-temps.com> wrote:
> 
> > Hey Otis,
> >
> > Thanks for the hasty response and apologies for my delayed
> response.
> > It was
> > Friday and time to go :)
> >
> > The queries we're running are very varied (wildcard, phrase,
> normal).
> > The
> > index is only about a 1/2 gig in size (maybe 250,000 documents).
> The
> > machine
> > is running FreeBSD 5.3 with ~2 gig RAM.
> >
> > I got a thread dump from right around the time that the process
> would
> > deadlock and not come back and I noticed that almost all of the
> > threads were
> > waiting on the same method. Here's what the trace looks like:
> (small
> > sample
> > for the sake of brevity...the real dump is huge)
> >
> > tcpConnection-8080-32:
> >   [1]
> >
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal
> > (CompoundFileReader.java:217)
> >   [2] org.apache.lucene.store.InputStream.refill
> > (InputStream.java:158)
> >   [3] org.apache.lucene.store.InputStream.readByte
> > (InputStream.java:43)
> >   [4] org.apache.lucene.store.InputStream.readVInt
> > (InputStream.java:83)
> >   [5] org.apache.lucene.index.SegmentTermDocs.read
> > (SegmentTermDocs.java:126)
> >   [6] org.apache.lucene.search.TermScorer.next (TermScorer.java:55)
> >   [7] org.apache.lucene.search.BooleanScorer.next
> > (BooleanScorer.java:112)
> >   [8] org.apache.lucene.search.Scorer.score (Scorer.java:37)
> >   [9] org.apache.lucene.search.IndexSearcher.search
> > (IndexSearcher.java:92)
> >   [10] org.apache.lucene.search.Hits.getMoreDocs (Hits.java:64)
> >   [11] org.apache.lucene.search.Hits.<init> (Hits.java:43)
> >   [12] org.apache.lucene.search.Searcher.search (Searcher.java:33)
> >   [13] org.apache.lucene.search.Searcher.search (Searcher.java:27)
> >   [14] com.nettemps.search.backend.SingleIndexManager.search
> > (SingleIndexManager.java:335)
> >   [15] com.nettemps.search.backend.IndexAccessControl.doSearch
> > (IndexAccessControl.java:100)
> >   [16] com.nettemps.search.server.SearchServerImpl.searchResumes
> > (SearchServerImpl.java:402)
> >   [17]
> >
>
com.nettemps.search.server.SearchServerReadOnly_Tie.invoke_searchResumes
> > (SearchServerReadOnly_Tie.java:93)
> >   [18]
> > com.nettemps.search.server.SearchServerReadOnly_Tie.processingHook
> > (SearchServerReadOnly_Tie.java:298)
> >   [19] com.sun.xml.rpc.server.StreamingHandler.handle
> > (StreamingHandler.java:321)
> >   [20] com.sun.xml.rpc.server.http.JAXRPCServletDelegate.doPost
> > (JAXRPCServletDelegate.java:443)
> >   [21] com.sun.xml.rpc.server.http.JAXRPCServlet.doPost
> > (JAXRPCServlet.java:102)
> >   [22] javax.servlet.http.HttpServlet.service
> (HttpServlet.java:165)
> >   [23] javax.servlet.http.HttpServlet.service
> (HttpServlet.java:103)
> >   [24] com.caucho.server.http.FilterChainServlet.doFilter
> > (FilterChainServlet.java:96)
> >   [25] com.caucho.server.http.Invocation.service
> > (Invocation.java:315)
> >   [26] com.caucho.server.http.CacheInvocation.service
> > (CacheInvocation.java:135)
> >   [27] com.caucho.server.http.HttpRequest.handleRequest
> > (HttpRequest.java:253)
> >   [28] com.caucho.server.http.HttpRequest.handleConnection
> > (HttpRequest.java:170)
> >   [29] com.caucho.server.TcpConnection.run (TcpConnection.java:139)
> >   [30] java.lang.Thread.run (Thread.java:534)
> >
> > I took a look at that readInternal method and saw that the
> contention
> > is
> > around an InputStream that I assume reads from the actual index
> file
> > and
> > returns it for use by the method. We are running many threads that
> > are
> > attempting to do searches at the same time (roughly 30 - 35), so
> that
> > explains why the search times would go up.
> >
> > In an attempt to reduce the amount of contention, we synchronized
> our
> > search
> > method (the one that makes the actual call to Lucene's search: [14]
> > com.nettemps.search.backend.SingleIndexManager.search
> > (SingleIndexManager.java:335)). This also caused the same results
> > when hit
> > with too many threads.
> >
> > We're really stuck at this point as to what to try. Any advice?
> >
> >
> >
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Friday, July 08, 2005 3:40 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Search deadlocking under load
> >
> >
> > Nathan,
> >
> > 3) is the recommended usage.
> > Your index is on an NFS share, which means you are searching it
> over
> > the network.  Make it local, and you should see performance
> > improvements.  Local or remove, it makes sense that searches take
> > longer to execute, and the load goes up.  Yes, it shouldn't
> deadlock.
> > You shouldn't need to synchronize access to IndexSearcher.
> > When your JVM locks up next time, kill it, get the thread dump, and
> > send it to the list, so we can try to remove the bottleneck, if
> > that's
> > possible.
> >
> > How many queries/second do you run, and what kinds of queries are
> > they,
> > how big is your index and what kind of hardware (disks, RAM, CPU)
> are
> > you using?
> >
> > Otis
> >
> > --- Nathan Brackett <nb...@net-temps.com> wrote:
> >
> > > Hey all,
> > >
> > > We're looking to use Lucene as the back end to our website and
> > we're
> > > running
> > > into an unusual deadlocking problem.
> > >
> > > For testing purposes, we're just running one web server (threaded
> > > environment) against an index mounted on an NFS share. This
> machine
> > > performs
> > > searches only against this index so it's not being touched. We
> have
> > > tried a
> > > few different models so far:
> > >
> > > 1) Pooling IndexSearcher objects: Occasionally we would run into
> > > OutOfMemory
> > > problems as we would not block if a request came through and all
> > > IndexSearchers were already checked out, we would just create a
> > > temporary
> > > one and then dispose of it once it was returned to the pool.
> > >
> > > 2) Create a new IndexSearcher each time: Every request to search
> > > would
> > > create an IndexSearcher object. This quickly gave OutOfMemory
> > errors,
> > > even
> > > when we would close them out directly after.
> > >
> > > 3) Use a global IndexSearcher: This is the model we're working
> with
> > > now. The
> > > model holds up fine under low-moderate load and is, in fact, much
> > > faster at
> > > searching (probably due to some caching mechanism). Under heavy
> > load
> > > though,
> > > the CPU will spike up to 99% and never come back down until we
> kill
> > > -9 the
> > > process. Also, as you ramp the load, we've discovered that search
> > > times go
> > > up as well. Searches will generally come back after 40ms, but as
> > the
> > > load
> > > goes up the searches don't come back for up to 20 seconds.
> > >
> > > We've been attempting to find where the problem is for the last
> > week
> > > with no
> > > luck. Our index is optimized, so there is only one file. Do we
> need
> > > to
> > > synchronize access to the global IndexSearcher so that only one
> > > search can
> > > run at a time? That poses a bit of a problem as if a particular
> > > search takes
> > > a long time, all others will wait. This problem does not look
> like
> > an
> > > OutOrMemory error because the memory usage when the spike occurs
> is
> > > usually
> > > in the range of 150meg used with a ceiling of 650meg. Anyone else
> > > experiencing any problems like this or have any idea where we
> > should
> > > be
> > > looking? Thanks.
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

RE: Search deadlocking under load

Posted by Nathan Brackett <nb...@net-temps.com>.

Otis,

After further testing it turns out that the 'deadlock' we're encountering is
not a deadlock at all, but a result of resin hitting its maximum number of
allowed threads.  We bumped up the max-threads in the config and it fixed
the problem for a certain amount of load, but we'd much prefer to go after
the source of the problem, namely:

As the number of threads hitting lucene increases, contention for locks
increases, meaning the average response time decreases.  This places us in a
downward spiral of performance because as the incoming number of hits per
second stays constant, the response time decreases, meaning that the total
number of threads inside resin doing work will increase.  This problem
compounds itself, escalating the number of threads in resin until we crash.


Admittedly this is a pretty harsh test (~~20 hits per second triggering
complex searches, which starts fine but then escalates to > 150 threads as
processing slows down but number of incoming hits per second does not)

Our ultimate goal, however, is to have each search be completely and 100%
parallel.

The point of contention seems to be the method below:

FSDirectory.java:486 (class FSInputStream)



  protected final void readInternal(byte[] b, int offset, int len)
  		throws IOException {
  	synchronized (file) {
  		long position = getFilePointer();
  		if (position != file.position) {
  			file.seek(position);
  			file.position = position;
  		}
  		int total = 0;
  		do {
  			int i = file.read(b, offset+total, len-total);
  			if (i == -1)
  				throw new IOException("read past EOF");
  			file.position += i;
  			total += i;
  		} while (total < len);
  	}
  }




The threads are usually all lined up to reach this.  Why are so many threads
backed up behind the same instance of FSInputStream.readInternal?  Shouldn't
each search have a different input stream?  What would you suggest as the
best path to achieve 100% parallel searching?  Here's a sample of our thread
dump, you can see 2 threads waiting for the same FSInputStream$Descriptor
(which is the synchronized(file) above):

"tcpConnection-8080-11" daemon prio=5 tid=0x08304600 nid=0x8304800 waiting
for monitor entry [bf494000..bf494d08]
        at
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:412)
        - waiting to lock <0x2f2b7a38> (a
org.apache.lucene.store.FSInputStream$Descriptor)
        at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
        at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
        at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
        at
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
        at org.apache.lucene.search.TermScorer.next(TermScorer.java:55)
        at
org.apache.lucene.search.BooleanScorer.next(BooleanScorer.java:112)
        at org.apache.lucene.search.Scorer.score(Scorer.java:37)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
        at org.apache.lucene.search.Hits.<init>(Hits.java:43)
        at org.apache.lucene.search.Searcher.search(Searcher.java:33)
        at org.apache.lucene.search.Searcher.search(Searcher.java:27)
        at
com.nettemps.search.backend.SingleIndexManager.search(SingleIndexManager.jav
a:335)
        at
com.nettemps.search.backend.IndexAccessControl.doSearch(IndexAccessControl.j
ava:100)

"tcpConnection-8080-10" daemon prio=5 tid=0x08336800 nid=0x8336a00 waiting
for monitor entry [bf4d5000..bf4d5d08]
        at
org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:412)
        - waiting to lock <0x2f2b7a38> (a
org.apache.lucene.store.FSInputStream$Descriptor)
        at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
        at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
        at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
        at
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:126)
        at org.apache.lucene.search.TermScorer.next(TermScorer.java:55)
        at
org.apache.lucene.search.BooleanScorer.next(BooleanScorer.java:112)
        at org.apache.lucene.search.Scorer.score(Scorer.java:37)
        at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92)
        at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
        at org.apache.lucene.search.Hits.<init>(Hits.java:43)
        at org.apache.lucene.search.Searcher.search(Searcher.java:33)
        at org.apache.lucene.search.Searcher.search(Searcher.java:27)
        at
com.nettemps.search.backend.SingleIndexManager.search(SingleIndexManager.jav
a:335)

-----Original Message-----
From: Nathan Brackett [mailto:nbrackett@net-temps.com]
Sent: Monday, July 11, 2005 5:43 PM
To: java-user@lucene.apache.org
Subject: RE: Search deadlocking under load


Thanks for the advice. That ought to reduce contention a bit in that
particular method.

I've been reviewing a large amount of thread dumps today and I was wondering
if it's common to see many threads that look like this:

"tcpConnection-8080-20" daemon prio=5 tid=0x081ba000 nid=0x810ac00 waiting
for monitor entry [bf24b000..bf24bd20]
        at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
ndFileReader.java:216)
        - waiting to lock <0x2ee24c48> (a
org.apache.lucene.store.FSInputStream)

When I get the deadlock situation, I often see a few of these lying around,
but no matching thread that actually has the lock on 0x2ee24c48 in the dump.
Is this normal? Not really a thread dump pro.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Monday, July 11, 2005 1:57 PM
To: java-user@lucene.apache.org
Subject: RE: Search deadlocking under load


Hi Nick,

Without looking at the source of that method, I'd suggest first trying
the multifile index format (you can easily convert to it by setting the
new format on IndexWriter and optimizing it).  I'd be interested to
know if this eliminates the problem, or at least makes it harder to
hit.

Otis


--- Nathan Brackett <nb...@net-temps.com> wrote:

> Hey Otis,
>
> Thanks for the hasty response and apologies for my delayed response.
> It was
> Friday and time to go :)
>
> The queries we're running are very varied (wildcard, phrase, normal).
> The
> index is only about a 1/2 gig in size (maybe 250,000 documents). The
> machine
> is running FreeBSD 5.3 with ~2 gig RAM.
>
> I got a thread dump from right around the time that the process would
> deadlock and not come back and I noticed that almost all of the
> threads were
> waiting on the same method. Here's what the trace looks like: (small
> sample
> for the sake of brevity...the real dump is huge)
>
> tcpConnection-8080-32:
>   [1]
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal
> (CompoundFileReader.java:217)
>   [2] org.apache.lucene.store.InputStream.refill
> (InputStream.java:158)
>   [3] org.apache.lucene.store.InputStream.readByte
> (InputStream.java:43)
>   [4] org.apache.lucene.store.InputStream.readVInt
> (InputStream.java:83)
>   [5] org.apache.lucene.index.SegmentTermDocs.read
> (SegmentTermDocs.java:126)
>   [6] org.apache.lucene.search.TermScorer.next (TermScorer.java:55)
>   [7] org.apache.lucene.search.BooleanScorer.next
> (BooleanScorer.java:112)
>   [8] org.apache.lucene.search.Scorer.score (Scorer.java:37)
>   [9] org.apache.lucene.search.IndexSearcher.search
> (IndexSearcher.java:92)
>   [10] org.apache.lucene.search.Hits.getMoreDocs (Hits.java:64)
>   [11] org.apache.lucene.search.Hits.<init> (Hits.java:43)
>   [12] org.apache.lucene.search.Searcher.search (Searcher.java:33)
>   [13] org.apache.lucene.search.Searcher.search (Searcher.java:27)
>   [14] com.nettemps.search.backend.SingleIndexManager.search
> (SingleIndexManager.java:335)
>   [15] com.nettemps.search.backend.IndexAccessControl.doSearch
> (IndexAccessControl.java:100)
>   [16] com.nettemps.search.server.SearchServerImpl.searchResumes
> (SearchServerImpl.java:402)
>   [17]
>
com.nettemps.search.server.SearchServerReadOnly_Tie.invoke_searchResumes
> (SearchServerReadOnly_Tie.java:93)
>   [18]
> com.nettemps.search.server.SearchServerReadOnly_Tie.processingHook
> (SearchServerReadOnly_Tie.java:298)
>   [19] com.sun.xml.rpc.server.StreamingHandler.handle
> (StreamingHandler.java:321)
>   [20] com.sun.xml.rpc.server.http.JAXRPCServletDelegate.doPost
> (JAXRPCServletDelegate.java:443)
>   [21] com.sun.xml.rpc.server.http.JAXRPCServlet.doPost
> (JAXRPCServlet.java:102)
>   [22] javax.servlet.http.HttpServlet.service (HttpServlet.java:165)
>   [23] javax.servlet.http.HttpServlet.service (HttpServlet.java:103)
>   [24] com.caucho.server.http.FilterChainServlet.doFilter
> (FilterChainServlet.java:96)
>   [25] com.caucho.server.http.Invocation.service
> (Invocation.java:315)
>   [26] com.caucho.server.http.CacheInvocation.service
> (CacheInvocation.java:135)
>   [27] com.caucho.server.http.HttpRequest.handleRequest
> (HttpRequest.java:253)
>   [28] com.caucho.server.http.HttpRequest.handleConnection
> (HttpRequest.java:170)
>   [29] com.caucho.server.TcpConnection.run (TcpConnection.java:139)
>   [30] java.lang.Thread.run (Thread.java:534)
>
> I took a look at that readInternal method and saw that the contention
> is
> around an InputStream that I assume reads from the actual index file
> and
> returns it for use by the method. We are running many threads that
> are
> attempting to do searches at the same time (roughly 30 - 35), so that
> explains why the search times would go up.
>
> In an attempt to reduce the amount of contention, we synchronized our
> search
> method (the one that makes the actual call to Lucene's search: [14]
> com.nettemps.search.backend.SingleIndexManager.search
> (SingleIndexManager.java:335)). This also caused the same results
> when hit
> with too many threads.
>
> We're really stuck at this point as to what to try. Any advice?
>
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, July 08, 2005 3:40 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search deadlocking under load
>
>
> Nathan,
>
> 3) is the recommended usage.
> Your index is on an NFS share, which means you are searching it over
> the network.  Make it local, and you should see performance
> improvements.  Local or remove, it makes sense that searches take
> longer to execute, and the load goes up.  Yes, it shouldn't deadlock.
> You shouldn't need to synchronize access to IndexSearcher.
> When your JVM locks up next time, kill it, get the thread dump, and
> send it to the list, so we can try to remove the bottleneck, if
> that's
> possible.
>
> How many queries/second do you run, and what kinds of queries are
> they,
> how big is your index and what kind of hardware (disks, RAM, CPU) are
> you using?
>
> Otis
>
> --- Nathan Brackett <nb...@net-temps.com> wrote:
>
> > Hey all,
> >
> > We're looking to use Lucene as the back end to our website and
> we're
> > running
> > into an unusual deadlocking problem.
> >
> > For testing purposes, we're just running one web server (threaded
> > environment) against an index mounted on an NFS share. This machine
> > performs
> > searches only against this index so it's not being touched. We have
> > tried a
> > few different models so far:
> >
> > 1) Pooling IndexSearcher objects: Occasionally we would run into
> > OutOfMemory
> > problems as we would not block if a request came through and all
> > IndexSearchers were already checked out, we would just create a
> > temporary
> > one and then dispose of it once it was returned to the pool.
> >
> > 2) Create a new IndexSearcher each time: Every request to search
> > would
> > create an IndexSearcher object. This quickly gave OutOfMemory
> errors,
> > even
> > when we would close them out directly after.
> >
> > 3) Use a global IndexSearcher: This is the model we're working with
> > now. The
> > model holds up fine under low-moderate load and is, in fact, much
> > faster at
> > searching (probably due to some caching mechanism). Under heavy
> load
> > though,
> > the CPU will spike up to 99% and never come back down until we kill
> > -9 the
> > process. Also, as you ramp the load, we've discovered that search
> > times go
> > up as well. Searches will generally come back after 40ms, but as
> the
> > load
> > goes up the searches don't come back for up to 20 seconds.
> >
> > We've been attempting to find where the problem is for the last
> week
> > with no
> > luck. Our index is optimized, so there is only one file. Do we need
> > to
> > synchronize access to the global IndexSearcher so that only one
> > search can
> > run at a time? That poses a bit of a problem as if a particular
> > search takes
> > a long time, all others will wait. This problem does not look like
> an
> > OutOrMemory error because the memory usage when the spike occurs is
> > usually
> > in the range of 150meg used with a ceiling of 650meg. Anyone else
> > experiencing any problems like this or have any idea where we
> should
> > be
> > looking? Thanks.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Search deadlocking under load

Posted by Nathan Brackett <nb...@net-temps.com>.

Thanks for the advice. That ought to reduce contention a bit in that
particular method.

I've been reviewing a large amount of thread dumps today and I was wondering
if it's common to see many threads that look like this:

"tcpConnection-8080-20" daemon prio=5 tid=0x081ba000 nid=0x810ac00 waiting
for monitor entry [bf24b000..bf24bd20]
        at
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou
ndFileReader.java:216)
        - waiting to lock <0x2ee24c48> (a
org.apache.lucene.store.FSInputStream)

When I get the deadlock situation, I often see a few of these lying around,
but no matching thread that actually has the lock on 0x2ee24c48 in the dump.
Is this normal? Not really a thread dump pro.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Monday, July 11, 2005 1:57 PM
To: java-user@lucene.apache.org
Subject: RE: Search deadlocking under load


Hi Nick,

Without looking at the source of that method, I'd suggest first trying
the multifile index format (you can easily convert to it by setting the
new format on IndexWriter and optimizing it).  I'd be interested to
know if this eliminates the problem, or at least makes it harder to
hit.

Otis


--- Nathan Brackett <nb...@net-temps.com> wrote:

> Hey Otis,
>
> Thanks for the hasty response and apologies for my delayed response.
> It was
> Friday and time to go :)
>
> The queries we're running are very varied (wildcard, phrase, normal).
> The
> index is only about a 1/2 gig in size (maybe 250,000 documents). The
> machine
> is running FreeBSD 5.3 with ~2 gig RAM.
>
> I got a thread dump from right around the time that the process would
> deadlock and not come back and I noticed that almost all of the
> threads were
> waiting on the same method. Here's what the trace looks like: (small
> sample
> for the sake of brevity...the real dump is huge)
>
> tcpConnection-8080-32:
>   [1]
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal
> (CompoundFileReader.java:217)
>   [2] org.apache.lucene.store.InputStream.refill
> (InputStream.java:158)
>   [3] org.apache.lucene.store.InputStream.readByte
> (InputStream.java:43)
>   [4] org.apache.lucene.store.InputStream.readVInt
> (InputStream.java:83)
>   [5] org.apache.lucene.index.SegmentTermDocs.read
> (SegmentTermDocs.java:126)
>   [6] org.apache.lucene.search.TermScorer.next (TermScorer.java:55)
>   [7] org.apache.lucene.search.BooleanScorer.next
> (BooleanScorer.java:112)
>   [8] org.apache.lucene.search.Scorer.score (Scorer.java:37)
>   [9] org.apache.lucene.search.IndexSearcher.search
> (IndexSearcher.java:92)
>   [10] org.apache.lucene.search.Hits.getMoreDocs (Hits.java:64)
>   [11] org.apache.lucene.search.Hits.<init> (Hits.java:43)
>   [12] org.apache.lucene.search.Searcher.search (Searcher.java:33)
>   [13] org.apache.lucene.search.Searcher.search (Searcher.java:27)
>   [14] com.nettemps.search.backend.SingleIndexManager.search
> (SingleIndexManager.java:335)
>   [15] com.nettemps.search.backend.IndexAccessControl.doSearch
> (IndexAccessControl.java:100)
>   [16] com.nettemps.search.server.SearchServerImpl.searchResumes
> (SearchServerImpl.java:402)
>   [17]
>
com.nettemps.search.server.SearchServerReadOnly_Tie.invoke_searchResumes
> (SearchServerReadOnly_Tie.java:93)
>   [18]
> com.nettemps.search.server.SearchServerReadOnly_Tie.processingHook
> (SearchServerReadOnly_Tie.java:298)
>   [19] com.sun.xml.rpc.server.StreamingHandler.handle
> (StreamingHandler.java:321)
>   [20] com.sun.xml.rpc.server.http.JAXRPCServletDelegate.doPost
> (JAXRPCServletDelegate.java:443)
>   [21] com.sun.xml.rpc.server.http.JAXRPCServlet.doPost
> (JAXRPCServlet.java:102)
>   [22] javax.servlet.http.HttpServlet.service (HttpServlet.java:165)
>   [23] javax.servlet.http.HttpServlet.service (HttpServlet.java:103)
>   [24] com.caucho.server.http.FilterChainServlet.doFilter
> (FilterChainServlet.java:96)
>   [25] com.caucho.server.http.Invocation.service
> (Invocation.java:315)
>   [26] com.caucho.server.http.CacheInvocation.service
> (CacheInvocation.java:135)
>   [27] com.caucho.server.http.HttpRequest.handleRequest
> (HttpRequest.java:253)
>   [28] com.caucho.server.http.HttpRequest.handleConnection
> (HttpRequest.java:170)
>   [29] com.caucho.server.TcpConnection.run (TcpConnection.java:139)
>   [30] java.lang.Thread.run (Thread.java:534)
>
> I took a look at that readInternal method and saw that the contention
> is
> around an InputStream that I assume reads from the actual index file
> and
> returns it for use by the method. We are running many threads that
> are
> attempting to do searches at the same time (roughly 30 - 35), so that
> explains why the search times would go up.
>
> In an attempt to reduce the amount of contention, we synchronized our
> search
> method (the one that makes the actual call to Lucene's search: [14]
> com.nettemps.search.backend.SingleIndexManager.search
> (SingleIndexManager.java:335)). This also caused the same results
> when hit
> with too many threads.
>
> We're really stuck at this point as to what to try. Any advice?
>
>
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, July 08, 2005 3:40 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search deadlocking under load
>
>
> Nathan,
>
> 3) is the recommended usage.
> Your index is on an NFS share, which means you are searching it over
> the network.  Make it local, and you should see performance
> improvements.  Local or remove, it makes sense that searches take
> longer to execute, and the load goes up.  Yes, it shouldn't deadlock.
> You shouldn't need to synchronize access to IndexSearcher.
> When your JVM locks up next time, kill it, get the thread dump, and
> send it to the list, so we can try to remove the bottleneck, if
> that's
> possible.
>
> How many queries/second do you run, and what kinds of queries are
> they,
> how big is your index and what kind of hardware (disks, RAM, CPU) are
> you using?
>
> Otis
>
> --- Nathan Brackett <nb...@net-temps.com> wrote:
>
> > Hey all,
> >
> > We're looking to use Lucene as the back end to our website and
> we're
> > running
> > into an unusual deadlocking problem.
> >
> > For testing purposes, we're just running one web server (threaded
> > environment) against an index mounted on an NFS share. This machine
> > performs
> > searches only against this index so it's not being touched. We have
> > tried a
> > few different models so far:
> >
> > 1) Pooling IndexSearcher objects: Occasionally we would run into
> > OutOfMemory
> > problems as we would not block if a request came through and all
> > IndexSearchers were already checked out, we would just create a
> > temporary
> > one and then dispose of it once it was returned to the pool.
> >
> > 2) Create a new IndexSearcher each time: Every request to search
> > would
> > create an IndexSearcher object. This quickly gave OutOfMemory
> errors,
> > even
> > when we would close them out directly after.
> >
> > 3) Use a global IndexSearcher: This is the model we're working with
> > now. The
> > model holds up fine under low-moderate load and is, in fact, much
> > faster at
> > searching (probably due to some caching mechanism). Under heavy
> load
> > though,
> > the CPU will spike up to 99% and never come back down until we kill
> > -9 the
> > process. Also, as you ramp the load, we've discovered that search
> > times go
> > up as well. Searches will generally come back after 40ms, but as
> the
> > load
> > goes up the searches don't come back for up to 20 seconds.
> >
> > We've been attempting to find where the problem is for the last
> week
> > with no
> > luck. Our index is optimized, so there is only one file. Do we need
> > to
> > synchronize access to the global IndexSearcher so that only one
> > search can
> > run at a time? That poses a bit of a problem as if a particular
> > search takes
> > a long time, all others will wait. This problem does not look like
> an
> > OutOrMemory error because the memory usage when the spike occurs is
> > usually
> > in the range of 150meg used with a ceiling of 650meg. Anyone else
> > experiencing any problems like this or have any idea where we
> should
> > be
> > looking? Thanks.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Search deadlocking under load

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi Nick,

Without looking at the source of that method, I'd suggest first trying
the multifile index format (you can easily convert to it by setting the
new format on IndexWriter and optimizing it).  I'd be interested to
know if this eliminates the problem, or at least makes it harder to
hit.

Otis


--- Nathan Brackett <nb...@net-temps.com> wrote:

> Hey Otis,
> 
> Thanks for the hasty response and apologies for my delayed response.
> It was
> Friday and time to go :)
> 
> The queries we're running are very varied (wildcard, phrase, normal).
> The
> index is only about a 1/2 gig in size (maybe 250,000 documents). The
> machine
> is running FreeBSD 5.3 with ~2 gig RAM.
> 
> I got a thread dump from right around the time that the process would
> deadlock and not come back and I noticed that almost all of the
> threads were
> waiting on the same method. Here's what the trace looks like: (small
> sample
> for the sake of brevity...the real dump is huge)
> 
> tcpConnection-8080-32:
>   [1]
> org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal
> (CompoundFileReader.java:217)
>   [2] org.apache.lucene.store.InputStream.refill
> (InputStream.java:158)
>   [3] org.apache.lucene.store.InputStream.readByte
> (InputStream.java:43)
>   [4] org.apache.lucene.store.InputStream.readVInt
> (InputStream.java:83)
>   [5] org.apache.lucene.index.SegmentTermDocs.read
> (SegmentTermDocs.java:126)
>   [6] org.apache.lucene.search.TermScorer.next (TermScorer.java:55)
>   [7] org.apache.lucene.search.BooleanScorer.next
> (BooleanScorer.java:112)
>   [8] org.apache.lucene.search.Scorer.score (Scorer.java:37)
>   [9] org.apache.lucene.search.IndexSearcher.search
> (IndexSearcher.java:92)
>   [10] org.apache.lucene.search.Hits.getMoreDocs (Hits.java:64)
>   [11] org.apache.lucene.search.Hits.<init> (Hits.java:43)
>   [12] org.apache.lucene.search.Searcher.search (Searcher.java:33)
>   [13] org.apache.lucene.search.Searcher.search (Searcher.java:27)
>   [14] com.nettemps.search.backend.SingleIndexManager.search
> (SingleIndexManager.java:335)
>   [15] com.nettemps.search.backend.IndexAccessControl.doSearch
> (IndexAccessControl.java:100)
>   [16] com.nettemps.search.server.SearchServerImpl.searchResumes
> (SearchServerImpl.java:402)
>   [17]
>
com.nettemps.search.server.SearchServerReadOnly_Tie.invoke_searchResumes
> (SearchServerReadOnly_Tie.java:93)
>   [18]
> com.nettemps.search.server.SearchServerReadOnly_Tie.processingHook
> (SearchServerReadOnly_Tie.java:298)
>   [19] com.sun.xml.rpc.server.StreamingHandler.handle
> (StreamingHandler.java:321)
>   [20] com.sun.xml.rpc.server.http.JAXRPCServletDelegate.doPost
> (JAXRPCServletDelegate.java:443)
>   [21] com.sun.xml.rpc.server.http.JAXRPCServlet.doPost
> (JAXRPCServlet.java:102)
>   [22] javax.servlet.http.HttpServlet.service (HttpServlet.java:165)
>   [23] javax.servlet.http.HttpServlet.service (HttpServlet.java:103)
>   [24] com.caucho.server.http.FilterChainServlet.doFilter
> (FilterChainServlet.java:96)
>   [25] com.caucho.server.http.Invocation.service
> (Invocation.java:315)
>   [26] com.caucho.server.http.CacheInvocation.service
> (CacheInvocation.java:135)
>   [27] com.caucho.server.http.HttpRequest.handleRequest
> (HttpRequest.java:253)
>   [28] com.caucho.server.http.HttpRequest.handleConnection
> (HttpRequest.java:170)
>   [29] com.caucho.server.TcpConnection.run (TcpConnection.java:139)
>   [30] java.lang.Thread.run (Thread.java:534)
> 
> I took a look at that readInternal method and saw that the contention
> is
> around an InputStream that I assume reads from the actual index file
> and
> returns it for use by the method. We are running many threads that
> are
> attempting to do searches at the same time (roughly 30 - 35), so that
> explains why the search times would go up.
> 
> In an attempt to reduce the amount of contention, we synchronized our
> search
> method (the one that makes the actual call to Lucene's search: [14]
> com.nettemps.search.backend.SingleIndexManager.search
> (SingleIndexManager.java:335)). This also caused the same results
> when hit
> with too many threads.
> 
> We're really stuck at this point as to what to try. Any advice?
> 
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, July 08, 2005 3:40 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search deadlocking under load
> 
> 
> Nathan,
> 
> 3) is the recommended usage.
> Your index is on an NFS share, which means you are searching it over
> the network.  Make it local, and you should see performance
> improvements.  Local or remove, it makes sense that searches take
> longer to execute, and the load goes up.  Yes, it shouldn't deadlock.
> You shouldn't need to synchronize access to IndexSearcher.
> When your JVM locks up next time, kill it, get the thread dump, and
> send it to the list, so we can try to remove the bottleneck, if
> that's
> possible.
> 
> How many queries/second do you run, and what kinds of queries are
> they,
> how big is your index and what kind of hardware (disks, RAM, CPU) are
> you using?
> 
> Otis
> 
> --- Nathan Brackett <nb...@net-temps.com> wrote:
> 
> > Hey all,
> >
> > We're looking to use Lucene as the back end to our website and
> we're
> > running
> > into an unusual deadlocking problem.
> >
> > For testing purposes, we're just running one web server (threaded
> > environment) against an index mounted on an NFS share. This machine
> > performs
> > searches only against this index so it's not being touched. We have
> > tried a
> > few different models so far:
> >
> > 1) Pooling IndexSearcher objects: Occasionally we would run into
> > OutOfMemory
> > problems as we would not block if a request came through and all
> > IndexSearchers were already checked out, we would just create a
> > temporary
> > one and then dispose of it once it was returned to the pool.
> >
> > 2) Create a new IndexSearcher each time: Every request to search
> > would
> > create an IndexSearcher object. This quickly gave OutOfMemory
> errors,
> > even
> > when we would close them out directly after.
> >
> > 3) Use a global IndexSearcher: This is the model we're working with
> > now. The
> > model holds up fine under low-moderate load and is, in fact, much
> > faster at
> > searching (probably due to some caching mechanism). Under heavy
> load
> > though,
> > the CPU will spike up to 99% and never come back down until we kill
> > -9 the
> > process. Also, as you ramp the load, we've discovered that search
> > times go
> > up as well. Searches will generally come back after 40ms, but as
> the
> > load
> > goes up the searches don't come back for up to 20 seconds.
> >
> > We've been attempting to find where the problem is for the last
> week
> > with no
> > luck. Our index is optimized, so there is only one file. Do we need
> > to
> > synchronize access to the global IndexSearcher so that only one
> > search can
> > run at a time? That poses a bit of a problem as if a particular
> > search takes
> > a long time, all others will wait. This problem does not look like
> an
> > OutOrMemory error because the memory usage when the spike occurs is
> > usually
> > in the range of 150meg used with a ceiling of 650meg. Anyone else
> > experiencing any problems like this or have any idea where we
> should
> > be
> > looking? Thanks.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Search deadlocking under load

Posted by Nathan Brackett <nb...@net-temps.com>.

Hey Otis,

Thanks for the hasty response and apologies for my delayed response. It was
Friday and time to go :)

The queries we're running are very varied (wildcard, phrase, normal). The
index is only about a 1/2 gig in size (maybe 250,000 documents). The machine
is running FreeBSD 5.3 with ~2 gig RAM.

I got a thread dump from right around the time that the process would
deadlock and not come back and I noticed that almost all of the threads were
waiting on the same method. Here's what the trace looks like: (small sample
for the sake of brevity...the real dump is huge)

tcpConnection-8080-32:
  [1] org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal
(CompoundFileReader.java:217)
  [2] org.apache.lucene.store.InputStream.refill (InputStream.java:158)
  [3] org.apache.lucene.store.InputStream.readByte (InputStream.java:43)
  [4] org.apache.lucene.store.InputStream.readVInt (InputStream.java:83)
  [5] org.apache.lucene.index.SegmentTermDocs.read
(SegmentTermDocs.java:126)
  [6] org.apache.lucene.search.TermScorer.next (TermScorer.java:55)
  [7] org.apache.lucene.search.BooleanScorer.next (BooleanScorer.java:112)
  [8] org.apache.lucene.search.Scorer.score (Scorer.java:37)
  [9] org.apache.lucene.search.IndexSearcher.search (IndexSearcher.java:92)
  [10] org.apache.lucene.search.Hits.getMoreDocs (Hits.java:64)
  [11] org.apache.lucene.search.Hits.<init> (Hits.java:43)
  [12] org.apache.lucene.search.Searcher.search (Searcher.java:33)
  [13] org.apache.lucene.search.Searcher.search (Searcher.java:27)
  [14] com.nettemps.search.backend.SingleIndexManager.search
(SingleIndexManager.java:335)
  [15] com.nettemps.search.backend.IndexAccessControl.doSearch
(IndexAccessControl.java:100)
  [16] com.nettemps.search.server.SearchServerImpl.searchResumes
(SearchServerImpl.java:402)
  [17]
com.nettemps.search.server.SearchServerReadOnly_Tie.invoke_searchResumes
(SearchServerReadOnly_Tie.java:93)
  [18] com.nettemps.search.server.SearchServerReadOnly_Tie.processingHook
(SearchServerReadOnly_Tie.java:298)
  [19] com.sun.xml.rpc.server.StreamingHandler.handle
(StreamingHandler.java:321)
  [20] com.sun.xml.rpc.server.http.JAXRPCServletDelegate.doPost
(JAXRPCServletDelegate.java:443)
  [21] com.sun.xml.rpc.server.http.JAXRPCServlet.doPost
(JAXRPCServlet.java:102)
  [22] javax.servlet.http.HttpServlet.service (HttpServlet.java:165)
  [23] javax.servlet.http.HttpServlet.service (HttpServlet.java:103)
  [24] com.caucho.server.http.FilterChainServlet.doFilter
(FilterChainServlet.java:96)
  [25] com.caucho.server.http.Invocation.service (Invocation.java:315)
  [26] com.caucho.server.http.CacheInvocation.service
(CacheInvocation.java:135)
  [27] com.caucho.server.http.HttpRequest.handleRequest
(HttpRequest.java:253)
  [28] com.caucho.server.http.HttpRequest.handleConnection
(HttpRequest.java:170)
  [29] com.caucho.server.TcpConnection.run (TcpConnection.java:139)
  [30] java.lang.Thread.run (Thread.java:534)

I took a look at that readInternal method and saw that the contention is
around an InputStream that I assume reads from the actual index file and
returns it for use by the method. We are running many threads that are
attempting to do searches at the same time (roughly 30 - 35), so that
explains why the search times would go up.

In an attempt to reduce the amount of contention, we synchronized our search
method (the one that makes the actual call to Lucene's search: [14]
com.nettemps.search.backend.SingleIndexManager.search
(SingleIndexManager.java:335)). This also caused the same results when hit
with too many threads.

We're really stuck at this point as to what to try. Any advice?



-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Friday, July 08, 2005 3:40 PM
To: java-user@lucene.apache.org
Subject: Re: Search deadlocking under load


Nathan,

3) is the recommended usage.
Your index is on an NFS share, which means you are searching it over
the network.  Make it local, and you should see performance
improvements.  Local or remove, it makes sense that searches take
longer to execute, and the load goes up.  Yes, it shouldn't deadlock.
You shouldn't need to synchronize access to IndexSearcher.
When your JVM locks up next time, kill it, get the thread dump, and
send it to the list, so we can try to remove the bottleneck, if that's
possible.

How many queries/second do you run, and what kinds of queries are they,
how big is your index and what kind of hardware (disks, RAM, CPU) are
you using?

Otis

--- Nathan Brackett <nb...@net-temps.com> wrote:

> Hey all,
>
> We're looking to use Lucene as the back end to our website and we're
> running
> into an unusual deadlocking problem.
>
> For testing purposes, we're just running one web server (threaded
> environment) against an index mounted on an NFS share. This machine
> performs
> searches only against this index so it's not being touched. We have
> tried a
> few different models so far:
>
> 1) Pooling IndexSearcher objects: Occasionally we would run into
> OutOfMemory
> problems as we would not block if a request came through and all
> IndexSearchers were already checked out, we would just create a
> temporary
> one and then dispose of it once it was returned to the pool.
>
> 2) Create a new IndexSearcher each time: Every request to search
> would
> create an IndexSearcher object. This quickly gave OutOfMemory errors,
> even
> when we would close them out directly after.
>
> 3) Use a global IndexSearcher: This is the model we're working with
> now. The
> model holds up fine under low-moderate load and is, in fact, much
> faster at
> searching (probably due to some caching mechanism). Under heavy load
> though,
> the CPU will spike up to 99% and never come back down until we kill
> -9 the
> process. Also, as you ramp the load, we've discovered that search
> times go
> up as well. Searches will generally come back after 40ms, but as the
> load
> goes up the searches don't come back for up to 20 seconds.
>
> We've been attempting to find where the problem is for the last week
> with no
> luck. Our index is optimized, so there is only one file. Do we need
> to
> synchronize access to the global IndexSearcher so that only one
> search can
> run at a time? That poses a bit of a problem as if a particular
> search takes
> a long time, all others will wait. This problem does not look like an
> OutOrMemory error because the memory usage when the spike occurs is
> usually
> in the range of 150meg used with a ceiling of 650meg. Anyone else
> experiencing any problems like this or have any idea where we should
> be
> looking? Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Search deadlocking under load

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Nathan,

3) is the recommended usage.
Your index is on an NFS share, which means you are searching it over
the network.  Make it local, and you should see performance
improvements.  Local or remove, it makes sense that searches take
longer to execute, and the load goes up.  Yes, it shouldn't deadlock. 
You shouldn't need to synchronize access to IndexSearcher.
When your JVM locks up next time, kill it, get the thread dump, and
send it to the list, so we can try to remove the bottleneck, if that's
possible.

How many queries/second do you run, and what kinds of queries are they,
how big is your index and what kind of hardware (disks, RAM, CPU) are
you using?

Otis

--- Nathan Brackett <nb...@net-temps.com> wrote:

> Hey all,
> 
> We're looking to use Lucene as the back end to our website and we're
> running
> into an unusual deadlocking problem.
> 
> For testing purposes, we're just running one web server (threaded
> environment) against an index mounted on an NFS share. This machine
> performs
> searches only against this index so it's not being touched. We have
> tried a
> few different models so far:
> 
> 1) Pooling IndexSearcher objects: Occasionally we would run into
> OutOfMemory
> problems as we would not block if a request came through and all
> IndexSearchers were already checked out, we would just create a
> temporary
> one and then dispose of it once it was returned to the pool.
> 
> 2) Create a new IndexSearcher each time: Every request to search
> would
> create an IndexSearcher object. This quickly gave OutOfMemory errors,
> even
> when we would close them out directly after.
> 
> 3) Use a global IndexSearcher: This is the model we're working with
> now. The
> model holds up fine under low-moderate load and is, in fact, much
> faster at
> searching (probably due to some caching mechanism). Under heavy load
> though,
> the CPU will spike up to 99% and never come back down until we kill
> -9 the
> process. Also, as you ramp the load, we've discovered that search
> times go
> up as well. Searches will generally come back after 40ms, but as the
> load
> goes up the searches don't come back for up to 20 seconds.
> 
> We've been attempting to find where the problem is for the last week
> with no
> luck. Our index is optimized, so there is only one file. Do we need
> to
> synchronize access to the global IndexSearcher so that only one
> search can
> run at a time? That poses a bit of a problem as if a particular
> search takes
> a long time, all others will wait. This problem does not look like an
> OutOrMemory error because the memory usage when the spike occurs is
> usually
> in the range of 150meg used with a ceiling of 650meg. Anyone else
> experiencing any problems like this or have any idea where we should
> be
> looking? Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org