You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Geetu Ambwani <ge...@gmail.com> on 2011/11/24 16:30:44 UTC

Fwd: Clustering and FieldType


Sent from my iPhone

Begin forwarded message:

> From: Geetu Ambwani <ge...@gmail.com>
> Date: November 23, 2011 2:52:38 PM EST
> To: solr-user-info@lucene.apache.org
> Subject: Clustering and FieldType
> 

> Hi
> Trying to use carrot2 for clustering search results. I have it setup except it seems to treat the field as regular text instead of applying some custom filters I have. 
> 
> So my schema says something like
> <field name="title" type="ic_text" indexed="true" stored="true" omitNorms="true"/>
> <field name="content" type="ic_text" indexed="true" stored="true" compressed="true"/>
>  
> ic_text is our internal fieldtype with some custom analysers that strip out certain special characters from the text. 
> 
> My solrconfig has something like this setup in our default search handler. 
> <bool name="clustering">true</bool>
> <str name="clustering.engine">default</str>
> <bool name="clustering.results">true</bool>
> <!-- The title field -->
> <str name="carrot.title">title</str>
> <!-- The field to cluster on -->
> <str name="carrot.snippet">content</str>
> 
> In my search results, I see clusters but the labels on these clusters have the special characters in them - which means that the clustering must be running on raw text and not on the "ic_text" field. 
> Can someone let me know if this is the default setup and if there is a way to fix this ?
> Thanks !
> Geetu
> 

Re: Clustering and FieldType

Posted by Stanislaw Osinski <st...@osinski.name>.
Hi,

You're right -- currently Carrot2 clustering ignores the Solr analysis
chain and uses its own pipeline. It is possible to integrate with Solr's
analysis components to some extent, see the discussion here:
https://issues.apache.org/jira/browse/SOLR-2917.

Staszek


> > Hi
> > Trying to use carrot2 for clustering search results. I have it setup
> except it seems to treat the field as regular text instead of applying some
> custom filters I have.
> >
> > So my schema says something like
> > <field name="title" type="ic_text" indexed="true" stored="true"
> omitNorms="true"/>
> > <field name="content" type="ic_text" indexed="true" stored="true"
> compressed="true"/>
> >
> > ic_text is our internal fieldtype with some custom analysers that strip
> out certain special characters from the text.
> >
> > My solrconfig has something like this setup in our default search
> handler.
> > <bool name="clustering">true</bool>
> > <str name="clustering.engine">default</str>
> > <bool name="clustering.results">true</bool>
> > <!-- The title field -->
> > <str name="carrot.title">title</str>
> > <!-- The field to cluster on -->
> > <str name="carrot.snippet">content</str>
> >
> > In my search results, I see clusters but the labels on these clusters
> have the special characters in them - which means that the clustering must
> be running on raw text and not on the "ic_text" field.
> > Can someone let me know if this is the default setup and if there is a
> way to fix this ?
> > Thanks !
> > Geetu
> >
>