You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by dizh <di...@neusoft.com> on 2013/01/29 06:54:28 UTC

Large Index Query Help!

Hi All:

I hava a large Index repo, size is 47G, yes 47G , when I look up for something in it , and then it is dead, I trace its execution as follows:

        at org.apache.lucene.search.TopFieldCollector.add(TopFieldCollector.java:1178)
        at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:87)
        at org.apache.lucene.search.Scorer.score(Scorer.java:62)
        at org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.score(ConstantScoreQuery.java:238)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:555)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484)
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)

I think it is because my Index is too large , and all jvm heaps are used, and I want to ask for some answers:

How I can reduce the memory which lucene used, Linux Top command result is as follow:

top - 13:45:11 up  2:56,  3 users,  load average: 0.59, 0.33, 0.51
Tasks: 193 total,   1 running, 192 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.2%us,  0.2%sy,  0.0%ni, 86.5%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3933684k total,  3793900k used,   139784k free,    21704k buffers
Swap: 20482864k total,   490644k used, 19992220k free,  1408172k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 5318 root      21   0 47.7g 1.6g 267m S 99.9 41.4   0:50.84 java                                                                   
 5536 root      15   0  596m  67m 5856 S  3.0  1.8   0:26.22 python    

it is clearly Lucene load all index into memory, and I also know after this query the memory is returned to OS, but just in this query, it is hung.

Could anyone give some suggestions about how to query large Index on a single machine  not distributed?

 
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: Large Index Query Help!

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: Large Index Query Help!
: References: <13...@n3.nabble.com>

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Re: Large Index Query Help!

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> 4 JVM Flag : -Xms512m -Xmx1576m
> 5 Other app don't occupy too much memory

As said by Ian, read this blog post and you will understand that Lucene is not eating your memory. The "RES" column in TOP shows the actual memory usage (resident memory):

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                
 5318 root      21   0 47.7g 1.6g 267m S 99.9 41.4   0:50.84 java                                                                   

And that is only 1.6 GB which is your -Xmx. The "VIRT" column only shows address space and that’s more or less unlimited on 64 bit operating systems. Please  read (that's important to understand your top output): http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

The problem that your system stops responding may be a deadlock in your openSearcher method. I think there may be too much synchronization. Aquire() needs no synchronization. This is as bad as synchronizing IndexWriter calls. Those classes are documented to be thread safe and using it with extra synchronization is hurting performance or make it stop working. Can you post the thread stack when it is hung. The trace you posted has no locks involved.

Uwe

> My Code is as follow:
> 
>   public synchronized IndexSearcher openSearcher() {
>         if (QueryUtil.indexExist(SearchEnv.searchEnv.indexDir)
>             && SearchEnv.searchEnv.status != IndexStatus.StopWorking) {
>             try {
> // This is How I get Lucene Searcher
>                 IndexSearcher searcher = SearchEnv.searchEnv.indexBuilder
>                     .getNRTManager().acquire();
>                 return searcher;
>             } catch (Exception e) {
>                 logger.error(e.getMessage(), e);
>             }
>         }
>         return null;
>     }
> 
> I use Lucene NRTManager to get searcher, I think it is OK.
> 
> My Searcher only uses one directory, All my log are indexed into that
> directory.
> 
>                 TopDocs docs = searcher.search(query, null, number, sort);
>                 int total = docs.totalHits;
>                 if (total == 0) {
>                     return 0;
>                 }
>                 int number2 = end - start + 1;
>                 TopDocs tops = null;
>                 if (docs.scoreDocs.length > 1) {
>                     if (start == 0)
>                         tops = searcher.searchAfter(docs.scoreDocs[start],
>                             query, number2, sort);
>                     else
>                         tops = searcher.searchAfter(docs.scoreDocs[start - 1],
>                             query, number2, sort);
>                 }
> start and end is used to do paging , like SQL limit.
> 
> I think it is because I use searcher.search to find total, and then use
> searchAfter to do page.
> 
> but if I don't do the first step, How I can do paging effciently ? or I use custom
> Collector ?
> 
> I am new to Lucene , forgive me for my ignoring.
> 
> 
> 
> 
> 
> 
> From: Ian Lea
> Date: 2013-01-29 17:02
> To: java-user@lucene.apache.org; dizh
> Subject: Re: Large Index Query Help!
> Lucene won't load the whole index into memory.  See
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> What version of lucene?
> 
> How are you opening index readers?
> 
> How are you searching?
> 
> How much memory are you giving the jvm?
> 
> What else in your app is using all the memory?
> 
> What else is going on on your server?
> 
> 
> --
> Ian.
> 
> 
> On Tue, Jan 29, 2013 at 5:54 AM, dizh <di...@neusoft.com> wrote:
> > Hi All:
> >
> > I hava a large Index repo, size is 47G, yes 47G , when I look up for
> something in it , and then it is dead, I trace its execution as follows:
> >
> >         at
> org.apache.lucene.search.TopFieldCollector.add(TopFieldCollector.java:1178
> )
> >         at
> org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringColl
> ector.collect(TopFieldCollector.java:87)
> >         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> >         at
> org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.score(Const
> antScoreQuery.java:238)
> >         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> >         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:555)
> >         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
> >         at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484)
> >         at
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
> >
> > I think it is because my Index is too large , and all jvm heaps are used, and I
> want to ask for some answers:
> >
> > How I can reduce the memory which lucene used, Linux Top command
> result is as follow:
> >
> > top - 13:45:11 up  2:56,  3 users,  load average: 0.59, 0.33, 0.51
> > Tasks: 193 total,   1 running, 192 sleeping,   0 stopped,   0 zombie
> > Cpu(s): 13.2%us,  0.2%sy,  0.0%ni, 86.5%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
> > Mem:   3933684k total,  3793900k used,   139784k free,    21704k buffers
> > Swap: 20482864k total,   490644k used, 19992220k free,  1408172k cached
> >
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  5318 root      21   0 47.7g 1.6g 267m S 99.9 41.4   0:50.84 java
> >  5536 root      15   0  596m  67m 5856 S  3.0  1.8   0:26.22 python
> >
> > it is clearly Lucene load all index into memory, and I also know after this
> query the memory is returned to OS, but just in this query, it is hung.
> >
> > Could anyone give some suggestions about how to query large Index on a
> single machine  not distributed?
> >
> >
> > ----------------------------------------------------------------------
> > ----------------------------- Confidentiality Notice: The information
> > contained in this e-mail and any accompanying attachment(s) is
> > intended only for the use of the intended recipient and may be
> > confidential and/or privileged of Neusoft Corporation, its
> > subsidiaries and/or its affiliates. If any reader of this
> > communication is not the intended recipient, unauthorized use,
> > forwarding, printing,  storing, disclosure or copying is strictly
> > prohibited, and may be unlawful.If you have received this communication
> in error,please immediately notify the sender by return e-mail, and delete
> the original message and all copies from your system. Thank you.
> > ----------------------------------------------------------------------
> > -----------------------------
> ----------------------------------------------------------------------------------------------
> -----
> Confidentiality Notice: The information contained in this e-mail and any
> accompanying attachment(s) is intended only for the use of the intended
> recipient and may be confidential and/or privileged of Neusoft Corporation,
> its subsidiaries and/or its affiliates. If any reader of this communication is not
> the intended recipient, unauthorized use, forwarding, printing,  storing,
> disclosure or copying is strictly prohibited, and may be unlawful.If you have
> received this communication in error,please immediately notify the sender
> by return e-mail, and delete the original message and all copies from your
> system. Thank you.
> ----------------------------------------------------------------------------------------------
> -----


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Re: Large Index Query Help!

Posted by dizh <di...@neusoft.com>.
OK, I will show the scene.

1 OS : Redhat5
2 JVM 64bit jdk1.7
3 Lucene4.0
4 JVM Flag : -Xms512m -Xmx1576m
5 Other app don't occupy too much memory


My Code is as follow:

  public synchronized IndexSearcher openSearcher() {
        if (QueryUtil.indexExist(SearchEnv.searchEnv.indexDir)
            && SearchEnv.searchEnv.status != IndexStatus.StopWorking) {
            try {
// This is How I get Lucene Searcher
                IndexSearcher searcher = SearchEnv.searchEnv.indexBuilder
                    .getNRTManager().acquire();
                return searcher;
            } catch (Exception e) {
                logger.error(e.getMessage(), e);
            }
        }
        return null;
    }

I use Lucene NRTManager to get searcher, I think it is OK.

My Searcher only uses one directory, All my log are indexed into that directory.

                TopDocs docs = searcher.search(query, null, number, sort);
                int total = docs.totalHits;
                if (total == 0) {
                    return 0;
                }
                int number2 = end - start + 1;
                TopDocs tops = null;
                if (docs.scoreDocs.length > 1) {
                    if (start == 0)
                        tops = searcher.searchAfter(docs.scoreDocs[start],
                            query, number2, sort);
                    else
                        tops = searcher.searchAfter(docs.scoreDocs[start - 1],
                            query, number2, sort);
                }
start and end is used to do paging , like SQL limit.

I think it is because I use searcher.search to find total, and then use searchAfter to do page.

but if I don't do the first step, How I can do paging effciently ? or I use custom Collector ?

I am new to Lucene , forgive me for my ignoring.






From: Ian Lea
Date: 2013-01-29 17:02
To: java-user@lucene.apache.org; dizh
Subject: Re: Large Index Query Help!
Lucene won't load the whole index into memory.  See
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

What version of lucene?

How are you opening index readers?

How are you searching?

How much memory are you giving the jvm?

What else in your app is using all the memory?

What else is going on on your server?


--
Ian.


On Tue, Jan 29, 2013 at 5:54 AM, dizh <di...@neusoft.com> wrote:
> Hi All:
>
> I hava a large Index repo, size is 47G, yes 47G , when I look up for something in it , and then it is dead, I trace its execution as follows:
>
>         at org.apache.lucene.search.TopFieldCollector.add(TopFieldCollector.java:1178)
>         at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:87)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.score(ConstantScoreQuery.java:238)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:555)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
>
> I think it is because my Index is too large , and all jvm heaps are used, and I want to ask for some answers:
>
> How I can reduce the memory which lucene used, Linux Top command result is as follow:
>
> top - 13:45:11 up  2:56,  3 users,  load average: 0.59, 0.33, 0.51
> Tasks: 193 total,   1 running, 192 sleeping,   0 stopped,   0 zombie
> Cpu(s): 13.2%us,  0.2%sy,  0.0%ni, 86.5%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:   3933684k total,  3793900k used,   139784k free,    21704k buffers
> Swap: 20482864k total,   490644k used, 19992220k free,  1408172k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5318 root      21   0 47.7g 1.6g 267m S 99.9 41.4   0:50.84 java
>  5536 root      15   0  596m  67m 5856 S  3.0  1.8   0:26.22 python
>
> it is clearly Lucene load all index into memory, and I also know after this query the memory is returned to OS, but just in this query, it is hung.
>
> Could anyone give some suggestions about how to query large Index on a single machine  not distributed?
>
>
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

Re: Large Index Query Help!

Posted by Ian Lea <ia...@gmail.com>.
Lucene won't load the whole index into memory.  See
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

What version of lucene?

How are you opening index readers?

How are you searching?

How much memory are you giving the jvm?

What else in your app is using all the memory?

What else is going on on your server?


--
Ian.


On Tue, Jan 29, 2013 at 5:54 AM, dizh <di...@neusoft.com> wrote:
> Hi All:
>
> I hava a large Index repo, size is 47G, yes 47G , when I look up for something in it , and then it is dead, I trace its execution as follows:
>
>         at org.apache.lucene.search.TopFieldCollector.add(TopFieldCollector.java:1178)
>         at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.collect(TopFieldCollector.java:87)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at org.apache.lucene.search.ConstantScoreQuery$ConstantScorer.score(ConstantScoreQuery.java:238)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:555)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:484)
>         at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
>
> I think it is because my Index is too large , and all jvm heaps are used, and I want to ask for some answers:
>
> How I can reduce the memory which lucene used, Linux Top command result is as follow:
>
> top - 13:45:11 up  2:56,  3 users,  load average: 0.59, 0.33, 0.51
> Tasks: 193 total,   1 running, 192 sleeping,   0 stopped,   0 zombie
> Cpu(s): 13.2%us,  0.2%sy,  0.0%ni, 86.5%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
> Mem:   3933684k total,  3793900k used,   139784k free,    21704k buffers
> Swap: 20482864k total,   490644k used, 19992220k free,  1408172k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5318 root      21   0 47.7g 1.6g 267m S 99.9 41.4   0:50.84 java
>  5536 root      15   0  596m  67m 5856 S  3.0  1.8   0:26.22 python
>
> it is clearly Lucene load all index into memory, and I also know after this query the memory is returned to OS, but just in this query, it is hung.
>
> Could anyone give some suggestions about how to query large Index on a single machine  not distributed?
>
>
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org