You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Donni Khan <pr...@googlemail.com> on 2014/11/11 15:36:48 UTC

Remove instance from SequenceFile

Hi All,

I'm working with text mining by using Mahoup algorithms. I'm calculating
the similarity for text documents, First I computed the TF-IDF for all
documents (SequenceFIle format), During computing the similarity, there are
a lot of documents do not have any simlair Doc's. So I would like to remove
those document from SequenceFile vectors.

Any Idea to do that?

Thank in advance,

Donni.