You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by lovely kasi <ka...@gmail.com> on 2013/10/13 16:42:49 UTC

Clustering unstructured text data

Hi,

I have gone through the solr tutorial but i could find only indexing of the
json data.
I want to index and cluster the unstructured text data.
For example I have a folder which has 10 text files.Where each text file
contains 10 lines of text which is a communication between customer and
executive.
I want each file(i.e all 10 lines) to be considered as a single document
and indexed as one.


For example:

I have input text documents with data like below.

Document1: This is the first document of selling information.
Document2: This is the second document of gathering information.

I also have another look up file with data like below
selling:CatA
gathering:CatB.
information:CatC

NOw i would like to cluster the documents with output being genrated as
Document1:CatA,CatC
Document2:CatB,CatC

Please let me know how to achieve this

Re: Clustering unstructured text data

Posted by "tamanjit.bindra@yahoo.co.in" <ta...@yahoo.co.in>.
You may want to have a look at  SolrJ <http://wiki.apache.org/solr/Solrj>  



--
View this message in context: http://lucene.472066.n3.nabble.com/Clustering-unstructured-text-data-tp4095241p4095444.html
Sent from the Solr - User mailing list archive at Nabble.com.