You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sam ” <sk...@gmail.com> on 2012/04/25 16:41:01 UTC
can I use different tokenizer/analyzer for facet count query?
I have the following in schema.xml
<fieldType name="cq_tag" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.PathHierarchyTokenizerFactory"
delimiter="$"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
<field name="colors" type="cq_tag" indexed="true"
stored="true" multiValued="true"/>
And, I have the following doc:
<doc>
<arr name="colors">
<str>blues$Teal/Turquoise</str>
</arr>
...
</doc>
Response of the query:
http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=colors&rows=100
is
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="colors">
<int name="blues">1</int>
<int name="blues$Teal/Turquoise">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
During index, blues$Teal/Turquoise is tokenized into:
blues
blues$Teal/Turquoise
I think that's why facet count includes both blues and blues$Teal/Turquoise.
Can I have facet count only include the whole keyword,
blues$Teal/Turquoise, not blues?
Re: can I use different tokenizer/analyzer for facet count query?
Posted by Dan Tuffery <da...@gmail.com>.
If you use the KeywordTokenizer at index time too it should do what you
want. If that is not possible create another field.
Best practices for facet fields:
Indexed, not Tokenized (KeywordTokenizer)
Not stored
On Wed, Apr 25, 2012 at 3:52 PM, sam ” <sk...@gmail.com> wrote:
> From wiki:
> http://wiki.apache.org/solr/SimpleFacetParameters
>
> If you want both Analysis (for searching) and Faceting on the full literal
> Strings, *use copyField *to create two versions of the field: one Text and
> one String. Make sure both are indexed="true"
>
> Is that the only way? Do I need to have another field of type String? I'm
> using KeywordTokenizer for query...
>
> On Wed, Apr 25, 2012 at 10:41 AM, sam ” <sk...@gmail.com> wrote:
>
> > I have the following in schema.xml
> > <fieldType name="cq_tag" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.PathHierarchyTokenizerFactory"
> > delimiter="$"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.KeywordTokenizerFactory"/>
> > </analyzer>
> > </fieldType>
> > <field name="colors" type="cq_tag" indexed="true"
> > stored="true" multiValued="true"/>
> >
> >
> > And, I have the following doc:
> > <doc>
> > <arr name="colors">
> > <str>blues$Teal/Turquoise</str>
> > </arr>
> > ...
> > </doc>
> >
> >
> > Response of the query:
> >
> >
> http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=colors&rows=100
> > is
> >
> > <lst name="facet_counts">
> > <lst name="facet_queries"/>
> > <lst name="facet_fields">
> > <lst name="colors">
> > <int name="blues">1</int>
> > <int name="blues$Teal/Turquoise">1</int>
> > </lst>
> > </lst>
> > <lst name="facet_dates"/>
> > <lst name="facet_ranges"/>
> > </lst>
> >
> >
> >
> > During index, blues$Teal/Turquoise is tokenized into:
> > blues
> > blues$Teal/Turquoise
> >
> > I think that's why facet count includes both blues and
> > blues$Teal/Turquoise.
> >
> > Can I have facet count only include the whole keyword,
> > blues$Teal/Turquoise, not blues?
> >
> >
> >
>
Re: can I use different tokenizer/analyzer for facet count query?
Posted by sam ” <sk...@gmail.com>.
>From wiki:
http://wiki.apache.org/solr/SimpleFacetParameters
If you want both Analysis (for searching) and Faceting on the full literal
Strings, *use copyField *to create two versions of the field: one Text and
one String. Make sure both are indexed="true"
Is that the only way? Do I need to have another field of type String? I'm
using KeywordTokenizer for query...
On Wed, Apr 25, 2012 at 10:41 AM, sam ” <sk...@gmail.com> wrote:
> I have the following in schema.xml
> <fieldType name="cq_tag" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.PathHierarchyTokenizerFactory"
> delimiter="$"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> </analyzer>
> </fieldType>
> <field name="colors" type="cq_tag" indexed="true"
> stored="true" multiValued="true"/>
>
>
> And, I have the following doc:
> <doc>
> <arr name="colors">
> <str>blues$Teal/Turquoise</str>
> </arr>
> ...
> </doc>
>
>
> Response of the query:
>
> http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=colors&rows=100
> is
>
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="colors">
> <int name="blues">1</int>
> <int name="blues$Teal/Turquoise">1</int>
> </lst>
> </lst>
> <lst name="facet_dates"/>
> <lst name="facet_ranges"/>
> </lst>
>
>
>
> During index, blues$Teal/Turquoise is tokenized into:
> blues
> blues$Teal/Turquoise
>
> I think that's why facet count includes both blues and
> blues$Teal/Turquoise.
>
> Can I have facet count only include the whole keyword,
> blues$Teal/Turquoise, not blues?
>
>
>