You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gordon Riggs <gr...@siriusventures.com> on 2004/10/21 08:16:12 UTC

Looking for consulting help on project

Hi,
 
I am working on a web development project using PHP and mySQL. The team has
implemented full text search with mySQL, but is now researching Lucene to
help with performance/scalability issues. The team is looking for a
developer who has experience working with Lucene and can assist with
integrating into our environment. What follows is a brief overview of the
problems that we're working to address. If you have the experience with
using Lucene with large amounts of data (we have roughly 16 million records)
where search time is critical (needs to be under .2 seconds), then please
respond.
 
Thanks,
Gordon Riggs
griggs@siriusventures.com
 
1. Loading index into memory using Lucene's RAMDirectory
Why is the Java heap 2.9GB for a 1.4GB index?
Why can we not load an index over 1.4GB in size?  We receive
'java.lang.OutOfMemoryError' even with the -mx flag set to as high as '10g'.
We're using a dedicated test machine which has dual AMD Opteron processors
and 12GB of memory.  The OS is SuSE Linux Enterprise Server 9 (x86_64).  The
java version is: Java(TM) 2 Runtime Environment, Standard Edition (build
Blackdown-1.4.2) Java HotSpot(TM) 64-Bit Server VM (build
Blackdown-1.4.2-fcs, mixed mode)
We also get similar results with: Java(TM) 2 Runtime Environment, Standard
Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02,
mixed mode)

2. How to keep Lucene and Java in memory, to improve performance
The idea is to have a Lucene "daemon" that loads the index into memory once
on startup. It then listens for connections and performs search requests for
clients using that single index instance.
Do you foresee any problems (other than the ones stated above) with this
approach?
Garbage collection and/or memory leaks?  Performance issues?  
Concurrency issues with multiple searches coming in at once?
What's involved in writing the daemon?
Assuming that we need the daemon, we need to find out how big a job it is to
develop, what requirements need to be specified, etc.

3. How to interface our PHP web application with Java
Our web application is written in PHP so we need a communication interface
for performing search queries that is both PHP and Java friendly.
What do you think would be a good solution?  XML-RPC?
What's involved in developing the solution?

4. How to tune Lucene
Are there ways to "tune" Lucene in order to improve performance? We already
plan on moving the index into memory.
What else can be done to improve the search times? Can the way the index is
built affect performance?


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Looking for consulting help on project

Posted by David Spencer <da...@tropo.com>.
Suggestions....

[a]

Try invoking the VM w/ an option like "-XX:CompileThreshold=100" or even 
a smaller number. This encourages the hotspot VM to compile methods 
sooner, thus the app will take less time to "warm up".

http://java.sun.com/docs/hotspot/VMOptions.html#additional

You might want to search the web for refs to this, esp how things like 
Eclipse is brought up, as I think their invocation script sets other 
obscure options to guide GC too.

[b]

Any time I've worked w/ a hard core java server I've always found it 
helpful to have a loop explicitly trying to force gc - this is the idiom 
I use (i.e. you may have to do more than just System.gc()), and my 
suggestion is to try calling this every 15-60 secs so that memory use 
never jumps. I know that in theory you should never need to, but it may 
help.

	public static long gc()
	{
		long bef = mem();
		System.gc();
		sleep( 100);
		System.runFinalization();
		sleep( 100);
		System.gc();
		long aft= mem();
		return aft-bef;
	}


Gordon Riggs wrote:

> Hi,
>  
> I am working on a web development project using PHP and mySQL. The team has
> implemented full text search with mySQL, but is now researching Lucene to
> help with performance/scalability issues. The team is looking for a
> developer who has experience working with Lucene and can assist with
> integrating into our environment. What follows is a brief overview of the
> problems that we're working to address. If you have the experience with
> using Lucene with large amounts of data (we have roughly 16 million records)
> where search time is critical (needs to be under .2 seconds), then please
> respond.
>  
> Thanks,
> Gordon Riggs
> griggs@siriusventures.com
>  
> 1. Loading index into memory using Lucene's RAMDirectory
> Why is the Java heap 2.9GB for a 1.4GB index?
> Why can we not load an index over 1.4GB in size?  We receive
> 'java.lang.OutOfMemoryError' even with the -mx flag set to as high as '10g'.
> We're using a dedicated test machine which has dual AMD Opteron processors
> and 12GB of memory.  The OS is SuSE Linux Enterprise Server 9 (x86_64).  The
> java version is: Java(TM) 2 Runtime Environment, Standard Edition (build
> Blackdown-1.4.2) Java HotSpot(TM) 64-Bit Server VM (build
> Blackdown-1.4.2-fcs, mixed mode)
> We also get similar results with: Java(TM) 2 Runtime Environment, Standard
> Edition (build 1.4.2_03-b02) Java HotSpot(TM) Client VM (build 1.4.2_03-b02,
> mixed mode)
> 
> 2. How to keep Lucene and Java in memory, to improve performance
> The idea is to have a Lucene "daemon" that loads the index into memory once
> on startup. It then listens for connections and performs search requests for
> clients using that single index instance.
> Do you foresee any problems (other than the ones stated above) with this
> approach?
> Garbage collection and/or memory leaks?  Performance issues?  
> Concurrency issues with multiple searches coming in at once?
> What's involved in writing the daemon?
> Assuming that we need the daemon, we need to find out how big a job it is to
> develop, what requirements need to be specified, etc.
> 
> 3. How to interface our PHP web application with Java
> Our web application is written in PHP so we need a communication interface
> for performing search queries that is both PHP and Java friendly.
> What do you think would be a good solution?  XML-RPC?
> What's involved in developing the solution?
> 
> 4. How to tune Lucene
> Are there ways to "tune" Lucene in order to improve performance? We already
> plan on moving the index into memory.
> What else can be done to improve the search times? Can the way the index is
> built affect performance?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org