You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bong My <bo...@gmail.com> on 2005/01/19 18:34:45 UTC

Fwd: Do Lucene applicable to text analyst for IRC?? how??

---------- Forwarded message ----------
From: Bong My <bo...@gmail.com>
Date: Thu, 20 Jan 2005 01:12:55 +0800
Subject: Do Lucene applicable to text analyst for IRC?? how??
To: lucene-user-get.123_145@jakarta.apache.org


Im using Lucene1.4.2 to build my system. But my system is not build a
search engine like wat lucene web done that search result is displayed
based on the query from user. What im going to build is a text analyst
that i ll retrieve the chat discussion from online Internet Relay Chat
(IRC) and analyze it to get the topic discussion of the chatrooms.

At first, i ll analyse it using pre-processing process ,ie: stopping
and stemming.. And, tis is done using the Lucene1.4.2. I manage to get
the stem words from the chatrooms and store it in database. But my
problem is the steps after the stopping and stemming process. That i
duno how is the coding to add documents for every chatroom and get the
document term frequency matrix for that.. and also calculate the term
weight and inverse document frequency (idf) and presents the
document-terms weight in matrix..

My question is do Lucene 1.4.2 possible to do that? If yes, could
anyone pls giv me some sample code to do tat.. i had read the package
org.lucene.analysis.index.. there are TermFreqVec, TermFreq class sth
like tat.. bt im nt really udrstd the implementation of these classes,
how do i cal them to suit my system.. Which class or interface should
i call first.. and the steps as well...how is the codes to add
documents??

Thx to those kindly for help and reply..

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org