You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Matthew Castrigno <ca...@slhs.org> on 2023/02/02 22:47:00 UTC

facet.field reported as individual words and not the complete string

Hello community, thank you for taking the time to read my question. Your insights are most appreciated.

I am making queries utilizing the facet.field parameter. The field I provide is multivalued. It is indexed using a dynamic field:
<dynamicField name="*_ss"  type="text_general"  indexed="true"  stored="true" required="false" multiValued="true" />
It is indexed in with the name "facets_ss".
The values are often strings with spaces between the words resulting fields like this:
        "facets_ss":["Health Services",
          "Boise",
          "Meridian",
          "Provider",
          "St. Luke's Health Partner",
          "Female",
          "English"]
In my search results, however, facet_fields are reported based on individual words. So for the above their would be separate values for st, lukes, health and partner.
and not "St. Luke's Heath Partner"

How can I get these values reported for complete strings and not just the individual words?

Thank you so much.

(some facets deleted for reading clarity)
    "facet_counts": {
        "facet_queries": {},
        "facet_fields": {
            "facets_ss": [
                "boise",
                1,
                "english",
                3,
                "female",
                0,
                "luke's",
                0,
                "meridian",
                1,
                "nampa",
                1,
                "health",
                0,
                "partner",
                0,
                "provider",
                0,
                "services",
                2,
                "st",
                0
            ]
        },
        "facet_ranges": {},
        "facet_intervals": {},
        "facet_heatmaps": {}
    }




 [cid:71b4ba3d-5317-476e-896e-e4807addbdf3]

Matthew Castrigno

IHT Developer II

St. Luke’s Health System

•  208-859-4276
•  castrigm@slhs.org<ma...@slhs.org>

----------------------------------------------------------------------
"This message is intended for the use of the person or entity to which it is addressed and may contain information that is confidential or privileged, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this information is strictly prohibited. If you have received this message by error, please notify us immediately and destroy the related message."

Re: facet.field reported as individual words and not the complete string

Posted by Andy C <an...@gmail.com>.
You can also use a dynamic field as the destination. So with the
configuration below, if your document had a field 'foo_ss' it would create
a 'foo_facets'

< dynamicField name="*_facets" type="string" indexed="true" stored="true"
required="false" multiValued="true" />
<dynamicField name="*_ss"  type="text_general"  indexed="true"
 stored="true" required="false" multiValued="true" />
<copyField source="*_ss" dest="*_facets"/>

- Andy -

On Thu, Feb 2, 2023 at 6:38 PM Jeremy Buckley - IQ-C
<je...@gsa.gov.invalid> wrote:

> Yes, you should be able to use a dynamicField as the source.  That's really
> just a shorthand to keep you from having to enumerate all possible field
> names in the schema.  I don't think order matters, but I tend to put
> copyField directives after all the field and dynamicField definitions, for
> readability if nothing else.
>

Re: facet.field reported as individual words and not the complete string

Posted by Jeremy Buckley - IQ-C <je...@gsa.gov.INVALID>.
Yes, you should be able to use a dynamicField as the source.  That's really
just a shorthand to keep you from having to enumerate all possible field
names in the schema.  I don't think order matters, but I tend to put
copyField directives after all the field and dynamicField definitions, for
readability if nothing else.

Re: facet.field reported as individual words and not the complete string

Posted by Matthew Castrigno <ca...@slhs.org>.
Thank you Jeremy.

Can use a dynamic field for the source field in a copyField directive?
Does the order in which these statements appear relavant?

Thank you!

<field name="facets" type="string" indexed="true" stored="true" required="false" multiValued="true" />
<copyField source="facets_ss" dest="facets"/>
<dynamicField name="*_ss"  type="text_general"  indexed="true"  stored="true" required="false" multiValued="true" />



 [cid:32a6a450-c63f-4531-9b8e-d5545113bc0b]

Matthew Castrigno

IHT Developer II

St. Luke’s Health System

•  208-859-4276
•  castrigm@slhs.org<ma...@slhs.org>

________________________________
From: Jeremy Buckley - IQ-C <je...@gsa.gov.INVALID>
Sent: Thursday, February 2, 2023 4:16 PM
To: users@solr.apache.org <us...@solr.apache.org>
Subject: Re: facet.field reported as individual words and not the complete string

The culprit here is text_general. Your field is getting tokenized at index time, and each token gets returned as a facet value. Fields that you plan to use for faceting (or sorting) should be string or some numeric type. Common practice is
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender
You have not previously corresponded with this sender.

ZjQcmQRYFpfptBannerEnd

The culprit here is text_general.  Your field is getting tokenized at index
time, and each token gets returned as a facet value.  Fields that you plan
to use for faceting (or sorting) should be string or some numeric type.
Common practice is to define a second field of type string and use a
copyField directive in your schema to copy the value of the first field
into the second.  Use the new field for sorting and faceting, and use the
original text_general field for full text search.

----------------------------------------------------------------------
"This message is intended for the use of the person or entity to which it is addressed and may contain information that is confidential or privileged, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this information is strictly prohibited. If you have received this message by error, please notify us immediately and destroy the related message."

Re: facet.field reported as individual words and not the complete string

Posted by Jeremy Buckley - IQ-C <je...@gsa.gov.INVALID>.
The culprit here is text_general.  Your field is getting tokenized at index
time, and each token gets returned as a facet value.  Fields that you plan
to use for faceting (or sorting) should be string or some numeric type.
Common practice is to define a second field of type string and use a
copyField directive in your schema to copy the value of the first field
into the second.  Use the new field for sorting and faceting, and use the
original text_general field for full text search.