You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Donni Khan <pr...@googlemail.com> on 2014/11/11 15:36:48 UTC
Remove instance from SequenceFile
Hi All,
I'm working with text mining by using Mahoup algorithms. I'm calculating
the similarity for text documents, First I computed the TF-IDF for all
documents (SequenceFIle format), During computing the similarity, there are
a lot of documents do not have any simlair Doc's. So I would like to remove
those document from SequenceFile vectors.
Any Idea to do that?
Thank in advance,
Donni.