You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by sam ” <sk...@gmail.com> on 2012/04/25 16:41:01 UTC

can I use different tokenizer/analyzer for facet count query?

I have the following in schema.xml
    <fieldType name="cq_tag" class="solr.TextField"
positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.PathHierarchyTokenizerFactory"
delimiter="$"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.KeywordTokenizerFactory"/>
        </analyzer>
    </fieldType>
    <field name="colors"             type="cq_tag"      indexed="true"
stored="true" multiValued="true"/>


And, I have the following doc:
<doc>
    <arr name="colors">
        <str>blues$Teal/Turquoise</str>
    </arr>
    ...
</doc>


Response of the query:
http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=colors&rows=100
is

<lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
        <lst name="colors">
              <int name="blues">1</int>
              <int name="blues$Teal/Turquoise">1</int>
         </lst>
    </lst>
    <lst name="facet_dates"/>
    <lst name="facet_ranges"/>
</lst>



During index,  blues$Teal/Turquoise  is tokenized into:
blues
blues$Teal/Turquoise

I think that's why facet count includes both blues and blues$Teal/Turquoise.

Can I have facet count only include the whole keyword,
blues$Teal/Turquoise,  not blues?

Re: can I use different tokenizer/analyzer for facet count query?

Posted by Dan Tuffery <da...@gmail.com>.

If you use the KeywordTokenizer at index time too it should do what you
want. If that is not possible create another field.

Best practices for facet fields:

Indexed, not Tokenized (KeywordTokenizer)
Not stored


On Wed, Apr 25, 2012 at 3:52 PM, sam ” <sk...@gmail.com> wrote:

> From wiki:
> http://wiki.apache.org/solr/SimpleFacetParameters
>
> If you want both Analysis (for searching) and Faceting on the full literal
> Strings, *use copyField *to create two versions of the field: one Text and
> one String. Make sure both are indexed="true"
>
> Is that the only way? Do I need to have another field of type String? I'm
> using KeywordTokenizer for query...
>
> On Wed, Apr 25, 2012 at 10:41 AM, sam ” <sk...@gmail.com> wrote:
>
> > I have the following in schema.xml
> >     <fieldType name="cq_tag" class="solr.TextField"
> > positionIncrementGap="100">
> >         <analyzer type="index">
> >             <tokenizer class="solr.PathHierarchyTokenizerFactory"
> > delimiter="$"/>
> >         </analyzer>
> >         <analyzer type="query">
> >             <tokenizer class="solr.KeywordTokenizerFactory"/>
> >         </analyzer>
> >     </fieldType>
> >     <field name="colors"             type="cq_tag"      indexed="true"
> > stored="true" multiValued="true"/>
> >
> >
> > And, I have the following doc:
> > <doc>
> >     <arr name="colors">
> >         <str>blues$Teal/Turquoise</str>
> >     </arr>
> >     ...
> > </doc>
> >
> >
> > Response of the query:
> >
> >
> http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=colors&rows=100
> > is
> >
> > <lst name="facet_counts">
> >     <lst name="facet_queries"/>
> >     <lst name="facet_fields">
> >         <lst name="colors">
> >               <int name="blues">1</int>
> >               <int name="blues$Teal/Turquoise">1</int>
> >          </lst>
> >     </lst>
> >     <lst name="facet_dates"/>
> >     <lst name="facet_ranges"/>
> > </lst>
> >
> >
> >
> > During index,  blues$Teal/Turquoise  is tokenized into:
> > blues
> > blues$Teal/Turquoise
> >
> > I think that's why facet count includes both blues and
> > blues$Teal/Turquoise.
> >
> > Can I have facet count only include the whole keyword,
> > blues$Teal/Turquoise,  not blues?
> >
> >
> >
>

Re: can I use different tokenizer/analyzer for facet count query?

Posted by sam ” <sk...@gmail.com>.

>From wiki:
http://wiki.apache.org/solr/SimpleFacetParameters

If you want both Analysis (for searching) and Faceting on the full literal
Strings, *use copyField *to create two versions of the field: one Text and
one String. Make sure both are indexed="true"

Is that the only way? Do I need to have another field of type String? I'm
using KeywordTokenizer for query...

On Wed, Apr 25, 2012 at 10:41 AM, sam ” <sk...@gmail.com> wrote:

> I have the following in schema.xml
>     <fieldType name="cq_tag" class="solr.TextField"
> positionIncrementGap="100">
>         <analyzer type="index">
>             <tokenizer class="solr.PathHierarchyTokenizerFactory"
> delimiter="$"/>
>         </analyzer>
>         <analyzer type="query">
>             <tokenizer class="solr.KeywordTokenizerFactory"/>
>         </analyzer>
>     </fieldType>
>     <field name="colors"             type="cq_tag"      indexed="true"
> stored="true" multiValued="true"/>
>
>
> And, I have the following doc:
> <doc>
>     <arr name="colors">
>         <str>blues$Teal/Turquoise</str>
>     </arr>
>     ...
> </doc>
>
>
> Response of the query:
>
> http://localhost:8983/solr/select/?q=*:*&facet=true&facet.field=colors&rows=100
> is
>
> <lst name="facet_counts">
>     <lst name="facet_queries"/>
>     <lst name="facet_fields">
>         <lst name="colors">
>               <int name="blues">1</int>
>               <int name="blues$Teal/Turquoise">1</int>
>          </lst>
>     </lst>
>     <lst name="facet_dates"/>
>     <lst name="facet_ranges"/>
> </lst>
>
>
>
> During index,  blues$Teal/Turquoise  is tokenized into:
> blues
> blues$Teal/Turquoise
>
> I think that's why facet count includes both blues and
> blues$Teal/Turquoise.
>
> Can I have facet count only include the whole keyword,
> blues$Teal/Turquoise,  not blues?
>
>
>