You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Vinay Kakade <vi...@yahoo.com> on 2002/11/15 07:02:30 UTC

extracting top k frequently occuring terms from a given set of documents

Hi
I want to use Lucene to extract top 10 frequently
occuring terms from the given set of HTML document.
Please let me know how lucene can be used for this
purpose. I want to know how can I get the frequently
occuring terms, after building index on given set of
documents using Lucene Indexer.
Please help me
regards
Vinay.

__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: extracting top k frequently occuring terms from a given set of documents

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Yes, I was going to post it.
I kept a copy, of course :)  So I'll stick it in CVS somewhere soon.

Otis

--- Doug Cutting <cu...@lucene.com> wrote:
> There was a class in the test directory that efficiently computed
> this, 
> but I think Otis recently removed it.  Perhaps it should be revived
> and 
> go in the sandbox or something...
> 
>
http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgNo=2620
> 
> Doug
> 
> 
> Vinay Kakade wrote:
> > Hi
> > I want to use Lucene to extract top 10 frequently
> > occuring terms from the given set of HTML document.
> > Please let me know how lucene can be used for this
> > purpose. I want to know how can I get the frequently
> > occuring terms, after building index on given set of
> > documents using Lucene Indexer.
> > Please help me
> > regards
> > Vinay.
> > 
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Web Hosting - Let the expert host your site
> > http://webhosting.yahoo.com
> > 
> > --
> > To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> > 
> 
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: extracting top k frequently occuring terms from a given set of documents

Posted by Doug Cutting <cu...@lucene.com>.

There was a class in the test directory that efficiently computed this, 
but I think Otis recently removed it.  Perhaps it should be revived and 
go in the sandbox or something...

http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene-dev@jakarta.apache.org&msgNo=2620

Doug


Vinay Kakade wrote:
> Hi
> I want to use Lucene to extract top 10 frequently
> occuring terms from the given set of HTML document.
> Please let me know how lucene can be used for this
> purpose. I want to know how can I get the frequently
> occuring terms, after building index on given set of
> documents using Lucene Indexer.
> Please help me
> regards
> Vinay.
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Web Hosting - Let the expert host your site
> http://webhosting.yahoo.com
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>