You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Scott Zhang <ge...@gmail.com> on 2009/07/02 20:40:34 UTC

Re: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?

Hi. Wagner Junior.
Thanks for you message.

I was thinking this mailing list is dead. lol.

I was copying the sample code from test/demo application distributed with
lucene.net.

Hits hits = searcher.Search(rootQuery);
iResultCount = hits.Length();

int start = pageNum * pageSize;
int end = System.Math.Min(hits.Length(), start + pageSize);
List<string> bookIdList = new List<string>();
for (int i = start; i < end; i++)
 {
     Document doc = hits.Doc(i);
 }


But when I check lucene.net code.
In Hits.cs, L105
TopDocs topDocs = (sort == null) ? searcher.Search(weight, filter, n) :
searcher.Search(weight, filter, n, sort);

length = topDocs.totalHits;

Then in IndexSearch.cs, L179
There is a statement:
scorer.Score(collector);

The implement of Score function is :(Scorer.cs, L64)
public virtual void  Score(HitCollector hc)
{
while (Next())
{
hc.Collect(Doc(), Score());
}
}

Or BooleanScorer2.cs, L411.
public override void  Score(HitCollector hc)
{
if (allowDocsOutOfOrder && requiredScorers.Count == 0 &&
prohibitedScorers.Count < 32)
{
// fall back to BooleanScorer, scores documents somewhat out of order
BooleanScorer bs = new BooleanScorer(GetSimilarity(), minNrShouldMatch);
System.Collections.IEnumerator si = optionalScorers.GetEnumerator();
while (si.MoveNext())
{
bs.Add((Scorer) si.Current, false, false);
}
si = prohibitedScorers.GetEnumerator();
while (si.MoveNext())
{
bs.Add((Scorer) si.Current, false, true);
}
bs.Score(hc);
}
else
{
if (countingSumScorer == null)
{
InitCountingSumScorer();
}
while (countingSumScorer.Next())
{
hc.Collect(countingSumScorer.Doc(), Score());
}
}
}

So seems no matter what I am using, the implementation of lucene.net always
use "HitCollector". Is this real?


Another thing is I recompiled lucene.net and reupload the dll to my server,
now when search for keyword "book" which give me 30M records count. I
checked w3wp.exe which consumed 1.1G memory which is somewhat abnormal. But
lucene.net doesn't throw OutOfMemory anymore. It is weird.


Thanks.
Regards.
Scott
On Thu, Jul 2, 2009 at 2:20 AM, Wagner Ignacio Pinto Junior <
wagneripjr@hotmail.com> wrote:

>
> Hi Scott,
>
>
>
> I was reading Lucene in Action and it warns us about reading all hits at
> once.
>
>
>
> Do you use hits or HitCollector?
>
>
>
> If you use HitCollector or parses all hits that's the problem.
>
>
>
> Try to page through the hits it uses lazy loading.
>
>
>
>
>
> I'm new to Lucene, so, sorry if I made any mistake here ;)
>
>
>
> Wagner Junior
>
> > Date: Wed, 1 Jul 2009 01:09:55 +0800
> > Subject: Hi. Does anyone know how to solve the OutOfMemory Exception
> during Search?
> > From: getyourcontacts@gmail.com
> > To: lucene-net-dev@incubator.apache.org
> >
> > Hi.I have created an Index by lucene.net which contains 30M documents.
> The
> > result index file is ~4G.
> > Now the problem is, when I search for some keyword which get over many
> > results. Lucene.net get OutOfMemory Exception.
> >
> > I think if we could limit the results eg: 20K results at most could solve
> > this problem.
> >
> > Welcome any solution.
> >
> > Thanks.
> > Regards.
> > Scott
>

Re: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?

Posted by Scott Zhang <ge...@gmail.com>.
Hi. I am using 2.3.2 rev 778263. Compiled with VS2005.

Whatever, in my view, the get all total number of results count and
GetMoreDocs function need to be improved.

I think there should be a way to get total number of results working like
"select count(*) from [tablename]", which only return a number. It should
not use the collection object to store all search results. Otherwise, like
in my case, there will be one day the search results exceed the usable
memory.


Regards.
Scott

On Fri, Jul 3, 2009 at 4:31 AM, Wagner Ignacio Pinto Junior <
wagneripjr@hotmail.com> wrote:

>
> Hi Scott,
>
>
>
> What's the version and/or revision of Lucenet.Net you're using?
>
>
>
> Anyway what I was talking about is using the method
>
> search(Query query,HitCollector results)
> that load all the search hits to memory. Bad idea.
>
>
>
> Search uses HitCollector because it pre-load the first 100 hits.
>
>
>
> I did debug a search with 500 hits and it loaded only 100 docs, but it did
> read some of the index to get the scores and normalize then so that no doc
> scores above 1.0
>
>
>
> I will pay more attention to memory consumption.
>
>
>
> I've compiled Lucene.Net 2.3.1.5 rev 756751 with VS2008 Team System
>
>
>
>
>
> Sorry about my english :)
>
> Wagner
>
> > Date: Fri, 3 Jul 2009 02:40:34 +0800
> > Subject: Re: Hi. Does anyone know how to solve the OutOfMemory Exception
> during Search?
> > From: getyourcontacts@gmail.com
> > To: wagneripjr@hotmail.com
> > CC: lucene-net-dev@incubator.apache.org
> >
> > Hi. Wagner Junior.
> > Thanks for you message.
> >
> > I was thinking this mailing list is dead. lol.
> >
> > I was copying the sample code from test/demo application distributed with
> > lucene.net.
> >
> > Hits hits = searcher.Search(rootQuery);
> > iResultCount = hits.Length();
> >
> > int start = pageNum * pageSize;
> > int end = System.Math.Min(hits.Length(), start + pageSize);
> > List<string> bookIdList = new List<string>();
> > for (int i = start; i < end; i++)
> > {
> > Document doc = hits.Doc(i);
> > }
> >
> >
> > But when I check lucene.net code.
> > In Hits.cs, L105
> > TopDocs topDocs = (sort == null) ? searcher.Search(weight, filter, n) :
> > searcher.Search(weight, filter, n, sort);
> >
> > length = topDocs.totalHits;
> >
> > Then in IndexSearch.cs, L179
> > There is a statement:
> > scorer.Score(collector);
> >
> > The implement of Score function is :(Scorer.cs, L64)
> > public virtual void Score(HitCollector hc)
> > {
> > while (Next())
> > {
> > hc.Collect(Doc(), Score());
> > }
> > }
> >
> > Or BooleanScorer2.cs, L411.
> > public override void Score(HitCollector hc)
> > {
> > if (allowDocsOutOfOrder && requiredScorers.Count == 0 &&
> > prohibitedScorers.Count < 32)
> > {
> > // fall back to BooleanScorer, scores documents somewhat out of order
> > BooleanScorer bs = new BooleanScorer(GetSimilarity(), minNrShouldMatch);
> > System.Collections.IEnumerator si = optionalScorers.GetEnumerator();
> > while (si.MoveNext())
> > {
> > bs.Add((Scorer) si.Current, false, false);
> > }
> > si = prohibitedScorers.GetEnumerator();
> > while (si.MoveNext())
> > {
> > bs.Add((Scorer) si.Current, false, true);
> > }
> > bs.Score(hc);
> > }
> > else
> > {
> > if (countingSumScorer == null)
> > {
> > InitCountingSumScorer();
> > }
> > while (countingSumScorer.Next())
> > {
> > hc.Collect(countingSumScorer.Doc(), Score());
> > }
> > }
> > }
> >
> > So seems no matter what I am using, the implementation of lucene.netalways
> > use "HitCollector". Is this real?
> >
> >
> > Another thing is I recompiled lucene.net and reupload the dll to my
> server,
> > now when search for keyword "book" which give me 30M records count. I
> > checked w3wp.exe which consumed 1.1G memory which is somewhat abnormal.
> But
> > lucene.net doesn't throw OutOfMemory anymore. It is weird.
> >
> >
> > Thanks.
> > Regards.
> > Scott
> > On Thu, Jul 2, 2009 at 2:20 AM, Wagner Ignacio Pinto Junior <
> > wagneripjr@hotmail.com> wrote:
> >
> > >
> > > Hi Scott,
> > >
> > >
> > >
> > > I was reading Lucene in Action and it warns us about reading all hits
> at
> > > once.
> > >
> > >
> > >
> > > Do you use hits or HitCollector?
> > >
> > >
> > >
> > > If you use HitCollector or parses all hits that's the problem.
> > >
> > >
> > >
> > > Try to page through the hits it uses lazy loading.
> > >
> > >
> > >
> > >
> > >
> > > I'm new to Lucene, so, sorry if I made any mistake here ;)
> > >
> > >
> > >
> > > Wagner Junior
> > >
> > > > Date: Wed, 1 Jul 2009 01:09:55 +0800
> > > > Subject: Hi. Does anyone know how to solve the OutOfMemory Exception
> > > during Search?
> > > > From: getyourcontacts@gmail.com
> > > > To: lucene-net-dev@incubator.apache.org
> > > >
> > > > Hi.I have created an Index by lucene.net which contains 30M
> documents.
> > > The
> > > > result index file is ~4G.
> > > > Now the problem is, when I search for some keyword which get over
> many
> > > > results. Lucene.net get OutOfMemory Exception.
> > > >
> > > > I think if we could limit the results eg: 20K results at most could
> solve
> > > > this problem.
> > > >
> > > > Welcome any solution.
> > > >
> > > > Thanks.
> > > > Regards.
> > > > Scott
> > >
>

RE: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?

Posted by Wagner Ignacio Pinto Junior <wa...@hotmail.com>.
Hi Scott,

 

What's the version and/or revision of Lucenet.Net you're using?

 

Anyway what I was talking about is using the method

search(Query query,HitCollector results)
that load all the search hits to memory. Bad idea.

 

Search uses HitCollector because it pre-load the first 100 hits.

 

I did debug a search with 500 hits and it loaded only 100 docs, but it did read some of the index to get the scores and normalize then so that no doc scores above 1.0

 

I will pay more attention to memory consumption.

 

I've compiled Lucene.Net 2.3.1.5 rev 756751 with VS2008 Team System

 

 

Sorry about my english :)

Wagner
 
> Date: Fri, 3 Jul 2009 02:40:34 +0800
> Subject: Re: Hi. Does anyone know how to solve the OutOfMemory Exception during Search?
> From: getyourcontacts@gmail.com
> To: wagneripjr@hotmail.com
> CC: lucene-net-dev@incubator.apache.org
> 
> Hi. Wagner Junior.
> Thanks for you message.
> 
> I was thinking this mailing list is dead. lol.
> 
> I was copying the sample code from test/demo application distributed with
> lucene.net.
> 
> Hits hits = searcher.Search(rootQuery);
> iResultCount = hits.Length();
> 
> int start = pageNum * pageSize;
> int end = System.Math.Min(hits.Length(), start + pageSize);
> List<string> bookIdList = new List<string>();
> for (int i = start; i < end; i++)
> {
> Document doc = hits.Doc(i);
> }
> 
> 
> But when I check lucene.net code.
> In Hits.cs, L105
> TopDocs topDocs = (sort == null) ? searcher.Search(weight, filter, n) :
> searcher.Search(weight, filter, n, sort);
> 
> length = topDocs.totalHits;
> 
> Then in IndexSearch.cs, L179
> There is a statement:
> scorer.Score(collector);
> 
> The implement of Score function is :(Scorer.cs, L64)
> public virtual void Score(HitCollector hc)
> {
> while (Next())
> {
> hc.Collect(Doc(), Score());
> }
> }
> 
> Or BooleanScorer2.cs, L411.
> public override void Score(HitCollector hc)
> {
> if (allowDocsOutOfOrder && requiredScorers.Count == 0 &&
> prohibitedScorers.Count < 32)
> {
> // fall back to BooleanScorer, scores documents somewhat out of order
> BooleanScorer bs = new BooleanScorer(GetSimilarity(), minNrShouldMatch);
> System.Collections.IEnumerator si = optionalScorers.GetEnumerator();
> while (si.MoveNext())
> {
> bs.Add((Scorer) si.Current, false, false);
> }
> si = prohibitedScorers.GetEnumerator();
> while (si.MoveNext())
> {
> bs.Add((Scorer) si.Current, false, true);
> }
> bs.Score(hc);
> }
> else
> {
> if (countingSumScorer == null)
> {
> InitCountingSumScorer();
> }
> while (countingSumScorer.Next())
> {
> hc.Collect(countingSumScorer.Doc(), Score());
> }
> }
> }
> 
> So seems no matter what I am using, the implementation of lucene.net always
> use "HitCollector". Is this real?
> 
> 
> Another thing is I recompiled lucene.net and reupload the dll to my server,
> now when search for keyword "book" which give me 30M records count. I
> checked w3wp.exe which consumed 1.1G memory which is somewhat abnormal. But
> lucene.net doesn't throw OutOfMemory anymore. It is weird.
> 
> 
> Thanks.
> Regards.
> Scott
> On Thu, Jul 2, 2009 at 2:20 AM, Wagner Ignacio Pinto Junior <
> wagneripjr@hotmail.com> wrote:
> 
> >
> > Hi Scott,
> >
> >
> >
> > I was reading Lucene in Action and it warns us about reading all hits at
> > once.
> >
> >
> >
> > Do you use hits or HitCollector?
> >
> >
> >
> > If you use HitCollector or parses all hits that's the problem.
> >
> >
> >
> > Try to page through the hits it uses lazy loading.
> >
> >
> >
> >
> >
> > I'm new to Lucene, so, sorry if I made any mistake here ;)
> >
> >
> >
> > Wagner Junior
> >
> > > Date: Wed, 1 Jul 2009 01:09:55 +0800
> > > Subject: Hi. Does anyone know how to solve the OutOfMemory Exception
> > during Search?
> > > From: getyourcontacts@gmail.com
> > > To: lucene-net-dev@incubator.apache.org
> > >
> > > Hi.I have created an Index by lucene.net which contains 30M documents.
> > The
> > > result index file is ~4G.
> > > Now the problem is, when I search for some keyword which get over many
> > > results. Lucene.net get OutOfMemory Exception.
> > >
> > > I think if we could limit the results eg: 20K results at most could solve
> > > this problem.
> > >
> > > Welcome any solution.
> > >
> > > Thanks.
> > > Regards.
> > > Scott
> >