You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mark Harwood (JIRA)" <ji...@apache.org> on 2005/11/28 21:00:37 UTC

[jira] Updated: (LUCENE-474) High Frequency Terms/Phrases at the Index level

     [ http://issues.apache.org/jira/browse/LUCENE-474?page=all ]

Mark Harwood updated LUCENE-474:
--------------------------------

    Attachment: colloc.zip

Here's some code that I've used before to find phrases in an index - see CollocationFinder.java.
If your index has termvector support enabled you can run it to mine the collocated terms. This is typically a long operation that you dont want to do too often.
The CollocationIndexer can be used to store the mined collocations in an index.

Possible uses for collocations are:
* automatically identifying candidate terms in a query that can be turned into a phrase query
* better spelling correction by using all terms in query as context to pick the most likely spelling variation 

Haven't done too much with this code but I've added it here because it sounds like it could be relevant

Cheers
Mark



> High Frequency Terms/Phrases at the Index level
> -----------------------------------------------
>
>          Key: LUCENE-474
>          URL: http://issues.apache.org/jira/browse/LUCENE-474
>      Project: Lucene - Java
>         Type: New Feature
>     Versions: 1.4
>     Reporter: Suri Babu B
>  Attachments: colloc.zip
>
> We should be able to find the all the high frequncy terms/phrases ( where frequency  is the search criteria / benchmark)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org