You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by escobar5 <es...@spymac.com> on 2006/03/22 17:11:23 UTC

java.lang.OutOfMemoryError in lucene

Hello, 

i'm having a problem when searching in lucene, i get a
java.lang.OutOfMemoryError: JVMXE004:OutOfMemoryError, stAllocArray for
executeJava failed. 
My index is about 17MB, when i run the search in my PC, it works ok, but
when i deploy it in the AIX server i get the error. 

Can you tell me what it can be? 

Thanks in advance.
--
View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3535340
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: java.lang.OutOfMemoryError in lucene

Posted by escobar5 <es...@spymac.com>.

Here is my search method, maybe it's something wrong with it:

public Vector buscar(String busqueda) throws Exception, Error
  {
    Vector results = new Vector();
    ResultadoBusqueda rb = null;
    IndexSearcher searcher = new IndexSearcher("/index");    
    Analyzer analyzer = new StandardAnalyzer();
    Hits hits = null;
    Document doc = null;
    String path = "";
    String nombreArchivo = "";
  
    String line = busqueda;
  
    QueryParser qp = new QueryParser("contents", analyzer);
    qp.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
    Query query = qp.parse(line);
    System.out.println("Searching for: " + query.toString("contents"));
    hits = searcher.search(query);
    
    for(int i = 0; i < hits.length(); i++)
    {
      doc = hits.doc(i);


      path = doc.get("path").replaceAll("R:",
"http://informatica.suranet.com/SDI");
      path = path.replace('\\', '/');
      nombreArchivo = path.substring(path.lastIndexOf("/")+1);

      rb = new ResultadoBusqueda(nombreArchivo, path);
      results.add(rb);
    }
    searcher.close();
    return results;

  }
--
View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3579744
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: java.lang.OutOfMemoryError in lucene

Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.

> But i have the IBM JDK 1.4.2, do you know if this version still have the
> problem??

I'm sorry I don't know that. But you can try it and if it solves the
problem,
you can add your experience to FAQ :)

Koji




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Changing ranking

Posted by Leon Chaddock <le...@macranet.co.uk>.

Hi Chris,
You said:
" 5 word occurances in a 10 word document would probably score the same as 
those 5 words in a 20 word document"

OK so If I set this option would this mean no of occurences was a major 
factor so that:

A phrase occurs 1 time in a 3 word document would be a lower rank than A 
phrase occurs 5 times in a 30 word document as there are more occurences 
irrespective of the doc size.

Have I understood correctly, I know there are other factors but I am just 
interested in general. It is very important to me to stop two word documents 
with the phrase occuring only once ranking higher than  100 word documents 
with the phrase occuring 10 times.

Many thanks for your help

Leon

----- Original Message ----- 
From: "Chris Hostetter" <ho...@fucit.org>
To: "Lucene Users" <ja...@lucene.apache.org>
Sent: Friday, March 24, 2006 6:47 PM
Subject: Re: Changing ranking


>
> (NOTE: replying back to java-user, for the reasons listed at
> http://people.apache.org/~hossman/#private_q )
>
> : Date: Fri, 24 Mar 2006 08:42:29 -0000
> : Subject: Re: Changing ranking
> :
> : HI Chris,
> : Thanks, so would that make it as simple as a document with 5 matching
> : occurences ranks higher than a document with 4 occurences?
>
> Score calculations tend to be complicated, but if what you really care
> about is just the number of occurences, then omiting norms is a one way to
> start.
>
> : This should achieve my objective of showing slightly longer documents 
> first
> : (reallly it doesnt actually have to be the longest, I just want to stop
> : documents with onle two words ranking first)
>
> it won't acctually make longer docs appear first -- it will just help
> ensure that there is no penalty for a doc being longer.  5 word occurances
> in a 10 word document would probably score the same as those 5 words in a
> 20 word document, the order that they come back might be determined by the
> order they were added to the index at that point.  term frequency also
> comes into play -- if your BooleanQuery contains 10 optional terms, and
> the 4 that apear the least frequently in your index appear in one
> document, and the other 6 apear in a differnet document -- the doc with
> the 4 rare ones might wind up scoring higher.
>
> To really understand scoring you should do some experiments, and look at
> the Explanation information for your queres to understand how things like
> tf and idf impact your score.  Then you can think about how you might want
> to change your Similarity class to meet your needs.
>
>
> : >
> : > : Is there anyway I can change luicene to rank longer documents with 
> more
> : > : phrase occurences higher
> : >
> : > if what you care about is only the number of occurences, and you don't
> : > want the length to be a factor at all, then using 
> Field.setOmitNorms(true)
> : > on the Field for every document you add will not only accomplish this, 
> but
> : > will also save one byte per field per document in your index.
> : >
> : > that can add up if you have a lot of fields whose length you don't 
> care
> : > about.
> : >
> : >
> : > -Hoss
> : >
> : >
> : > ---------------------------------------------------------------------
> : > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> : > For additional commands, e-mail: java-user-help@lucene.apache.org
> : >
> : >
> : >
> : >
> : >
> : > --
> : > No virus found in this incoming message.
> : > Checked by AVG Free Edition.
> : > Version: 7.1.385 / Virus Database: 268.3.0/290 - Release Date: 
> 23/03/2006
> : >
> : >
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.385 / Virus Database: 268.3.0/290 - Release Date: 23/03/2006
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Changing ranking

Posted by Chris Hostetter <ho...@fucit.org>.

(NOTE: replying back to java-user, for the reasons listed at
http://people.apache.org/~hossman/#private_q )

: Date: Fri, 24 Mar 2006 08:42:29 -0000
: Subject: Re: Changing ranking
:
: HI Chris,
: Thanks, so would that make it as simple as a document with 5 matching
: occurences ranks higher than a document with 4 occurences?

Score calculations tend to be complicated, but if what you really care
about is just the number of occurences, then omiting norms is a one way to
start.

: This should achieve my objective of showing slightly longer documents first
: (reallly it doesnt actually have to be the longest, I just want to stop
: documents with onle two words ranking first)

it won't acctually make longer docs appear first -- it will just help
ensure that there is no penalty for a doc being longer.  5 word occurances
in a 10 word document would probably score the same as those 5 words in a
20 word document, the order that they come back might be determined by the
order they were added to the index at that point.  term frequency also
comes into play -- if your BooleanQuery contains 10 optional terms, and
the 4 that apear the least frequently in your index appear in one
document, and the other 6 apear in a differnet document -- the doc with
the 4 rare ones might wind up scoring higher.

To really understand scoring you should do some experiments, and look at
the Explanation information for your queres to understand how things like
tf and idf impact your score.  Then you can think about how you might want
to change your Similarity class to meet your needs.


: >
: > : Is there anyway I can change luicene to rank longer documents with more
: > : phrase occurences higher
: >
: > if what you care about is only the number of occurences, and you don't
: > want the length to be a factor at all, then using Field.setOmitNorms(true)
: > on the Field for every document you add will not only accomplish this, but
: > will also save one byte per field per document in your index.
: >
: > that can add up if you have a lot of fields whose length you don't care
: > about.
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
: >
: >
: >
: >
: > --
: > No virus found in this incoming message.
: > Checked by AVG Free Edition.
: > Version: 7.1.385 / Virus Database: 268.3.0/290 - Release Date: 23/03/2006
: >
: >
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Changing ranking

Posted by Chris Hostetter <ho...@fucit.org>.

: Is there anyway I can change luicene to rank longer documents with more
: phrase occurences higher

if what you care about is only the number of occurences, and you don't
want the length to be a factor at all, then using Field.setOmitNorms(true)
on the Field for every document you add will not only accomplish this, but
will also save one byte per field per document in your index.

that can add up if you have a lot of fields whose length you don't care
about.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Changing ranking

Posted by Marvin Humphrey <ma...@rectangular.com>.

On Mar 23, 2006, at 11:22 AM, Otis Gospodnetic wrote:

> The place to start would be to look at the DefaultSimilarity, and  
> the norms method there.  Perhaps you want to create your own  
> Similarity implementation that returns either a constant 1 or  
> something else that will favour longer text.  Somebody else with  
> more experience in this area may have better or more precise  
> suggestions.

Here's an implementation of lengthNorm() that stops stops the  
weighting at 100 tokens.

   public float lengthNorm(String fieldName, int numTerms) {
     numTerms = numTerms < 100 ? 100 : numTerms;
     return (float)(1.0 / Math.sqrt(numTerms));
   }

If you adopt it, you must boost short but important fields (e.g.  
title), or they won't contribute enough.

KinoSearch (my loose Perl/C port of Lucene) uses this algorithm, and  
it seems to work well.

To see an earlier discussion on this subject perform a web search for  
"proposal defaultsimilarity lengthnorm".

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Changing ranking

Posted by Otis Gospodnetic <ot...@yahoo.com>.

The place to start would be to look at the DefaultSimilarity, and the norms method there.  Perhaps you want to create your own Similarity implementation that returns either a constant 1 or something else that will favour longer text.  Somebody else with more experience in this area may have better or more precise suggestions.

Otis

----- Original Message ----
From: Leon Chaddock <le...@macranet.co.uk>
To: java-user@lucene.apache.org
Sent: Thursday, March 23, 2006 9:43:14 AM
Subject: Changing ranking

Hi,
At present lucene seems to rank very short documents over longer documents 
where the phrase occurs more regularily for instance which the search term 
"cat"

"the cat went home"

ranks higher than

"the black cat when home past some other cats, on cat street"

Is there anyway I can change luicene to rank longer documents with more 
phrase occurences higher

Many thanks

Leon 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Changing ranking

Posted by Leon Chaddock <le...@macranet.co.uk>.

Hi,
At present lucene seems to rank very short documents over longer documents 
where the phrase occurs more regularily for instance which the search term 
"cat"

"the cat went home"

ranks higher than

"the black cat when home past some other cats, on cat street"

Is there anyway I can change luicene to rank longer documents with more 
phrase occurences higher

Many thanks

Leon 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: java.lang.OutOfMemoryError in lucene

Posted by escobar5 <es...@spymac.com>.

But i have the IBM JDK 1.4.2, do you know if this version still have the
problem??
--
View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3551247
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: java.lang.OutOfMemoryError in lucene

Posted by Koji Sekiguchi <ko...@m4.dion.ne.jp>.

> What else could it be? maybe the ibm jvm?

I'm not sure this is the case, but there is an issue about IBM JDK
at FAQ. Please read:

Why can't I use Lucene with IBM JDK 1.3.1?
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-1416be459d0bb822360b058
aac3c2ccf8ecc133e

regards,

Koji




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: java.lang.OutOfMemoryError in lucene

Posted by escobar5 <es...@spymac.com>.

I think the problem is not the memory, because i just tried to search in a
11k index that contains only one document but i still get the same problem.

What else could it be? maybe the ibm jvm?
--
View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3536578
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: java.lang.OutOfMemoryError in lucene

Posted by Olivier Jaquemet <ol...@jalios.com>.

Then you should probably try to increase them to a higher value to see 
if the problem still occurs.
The memory consumption on your production server is probably much higher 
than what you are used to on your development platform.

escobar5 wrote:
> I forgot to tell, i've already checked that and they are:
>
>  -Xms = 306m
>  -Xmx = 320m 
> --
> View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3536145
> Sent from the Lucene - Java Users forum at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>   


-- 
Olivier Jaquemet <ol...@jalios.com>
Ingénieur R&D Jalios S.A.
Tel: 01.39.23.92.83
http://www.jalios.com/
http://support.jalios.com/




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: java.lang.OutOfMemoryError in lucene

Posted by escobar5 <es...@spymac.com>.

I forgot to tell, i've already checked that and they are:

 -Xms = 306m
 -Xmx = 320m 
--
View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3536145
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: java.lang.OutOfMemoryError in lucene

Posted by Olivier Jaquemet <ol...@jalios.com>.

You should probably increase the memory allocated to the jvm using java 
option such as
-Xms128m -Xmx256m
(define 128mb of memory at startup which can increase to a maximum of 256)

escobar5 wrote:
> Hello, 
>
> i'm having a problem when searching in lucene, i get a
> java.lang.OutOfMemoryError: JVMXE004:OutOfMemoryError, stAllocArray for
> executeJava failed. 
> My index is about 17MB, when i run the search in my PC, it works ok, but
> when i deploy it in the AIX server i get the error. 
>
> Can you tell me what it can be? 
>
> Thanks in advance.
> --
> View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3535340
> Sent from the Lucene - Java Users forum at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>   


-- 
Olivier Jaquemet <ol...@jalios.com>
Ingénieur R&D Jalios S.A.
Tel: 01.39.23.92.83
http://www.jalios.com/
http://support.jalios.com/




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org