You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by ivan obeso <se...@gmail.com> on 2012/04/23 10:53:22 UTC
Faceted seatch with mahout
I want to implement a faceted search system with mahout. I have a bunch of
documents so I clusterized it with kfuzzymeans, and tried to use
TopDownClustering to give it a hierarchy, but it was unsuccessful, because
i think it is not the correct way to do this. I used LDA to extract topics
of each cluster, but without the hierarchy its impossible to give the user
a faceted search (I think) because I have some clusters with their topics,
but all of them are in the same "level".
It would be great if someone could explain me the best way to do this, all
the ideas are accepted :) the more extended responses, te best.
Thanks.
Re: Faceted seatch with mahout
Posted by Paritosh Ranjan <pr...@xebia.com>.
On 23-04-2012 14:23, ivan obeso wrote:
> I have a bunch of
> documents so I clusterized it with kfuzzymeans, and tried to use
> TopDownClustering to give it a hierarchy, but it was unsuccessful, because
> i think it is not the correct way to do this.
Mahout can not tell you which sub cluster belongs to which parent
cluster. You will have to keep track of it yourself. If you are keeping
track of which parent cluster produced which child cluster ( and its
vectors ), then you can simply use fuzzy k.
My advice, Use FuzzyK ( as it seems you have overlapping data ), store
vectors belonging to each cluster, then repeat it on each sub cluster.
And also keep track of which parent cluster produced which sub clusters.
Solr has support for faceted search. So, you can use this data with solr
to implement faceted search.