You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by ivan obeso <se...@gmail.com> on 2012/04/23 10:53:22 UTC

Faceted seatch with mahout

I want to implement a faceted search system with mahout. I have a bunch of
documents so I clusterized it with kfuzzymeans, and tried to use
TopDownClustering to give it a hierarchy, but it was unsuccessful, because
i think it is not the correct way to do this. I used LDA to extract topics
of each cluster, but without the hierarchy its impossible to give the user
a faceted search (I think) because I have some clusters with their topics,
but all of them are in the same "level".

It would be great if someone could explain me the best way to do this, all
the ideas are accepted :) the more extended responses, te best.

Thanks.

Re: Faceted seatch with mahout

Posted by Paritosh Ranjan <pr...@xebia.com>.
On 23-04-2012 14:23, ivan obeso wrote:
> I have a bunch of
> documents so I clusterized it with kfuzzymeans, and tried to use
> TopDownClustering to give it a hierarchy, but it was unsuccessful, because
> i think it is not the correct way to do this.
Mahout can not tell you which sub cluster belongs to which parent 
cluster. You will have to keep track of it yourself. If you are keeping 
track of which parent cluster produced which child cluster ( and its 
vectors ), then you can simply use fuzzy k.

My advice, Use FuzzyK ( as it seems you have overlapping data ), store 
vectors belonging to each cluster, then repeat it on each sub cluster. 
And also keep track of which parent cluster produced which sub clusters.

Solr has support for faceted search. So, you can use this data with solr 
to implement faceted search.