You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "MOYSE Gilles (Cetelem)" <gi...@cetelem.fr> on 2003/10/21 10:00:41 UTC

Compound expression extraction

Hi.

I'm trying to extract expressions from the terms position information, i.e.,
if two words appears frequently side-by-side, then we can consider that the
two words are only one. For instance, 'Object' and 'Oriented' appears
side-by-side 9 times out of 10. It allows us to define a new expression,
'Object_Oriented'.
Does anyone knows the statistical method to detect such expressions ?

Thanks.

Gilles Moyse

-----Message d'origine-----
De : Eric Jain [mailto:Eric.Jain@isb-sib.ch]
Envoyé : mardi 21 octobre 2003 09:24
À : Lucene Users List
Objet : Re: Lucene on Windows


> The CVS version of Lucene has a patch that allows one to use a
> 'Compound Index' instead of the traditional one.  This reduces the
> number of open files.  For more info, see/make the Javadocs for
> IndexWriter.

Interesting option. Do you have a rough idea of what the performance
impact of using this setting is?

--
Eric Jain


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org