You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Deepak Michael <dm...@proofpoint.com.INVALID> on 2023/02/27 22:13:48 UTC

Filtering facets

Hi

I have a muti-valued field, containing usernames, that I want to facet on, but I only want facets on a sub-set of those usernames.
For example, if I have these docs

doc1: { field1: [user1, user2, user4] }
doc2: { field1: [user1, user3, user5], }
doc3: { field1: [user3, user4, user6], }

I'd like to be facet on field1 so that I get the following results:

user1: 2
user2: 1

I tried having a query limiting the usernames and a facet on the usernames field, like so:

{
  "query": "field1:( user1 OR user2)",
  "facet": { "specialUsers" : {"type": "terms", "field" : "field1"}}
}

but that returns facets for each of the values in (including user3, user4 and user5).

The unique values for usernames, across all documents in our index, could be in millions.
I tried the `query` facet but that would mean having to add a `query` facet for each username to have a bucket on.
I also see that the `prefix` option does something similar to what we want - which is slotting a bucket for each username that matched the criteria(prefix).
In our case what's required is like a whitelist of field values for which the facets are to be included. We're looking at a custom faceting solution for this.
With what’s available, is it possible to extend the JSON facets so that I can register a new custom facet processor? Or is there another way to do this?

Thanks,
Deepak

Re: Filtering facets

Posted by Andy C <an...@gmail.com>.
Have you looked at
https://solr.apache.org/guide/8_11/faceting.html#limiting-facet-with-certain-terms

Is something like *facet.field={!terms='user1,user2'}field1* what you want?

- Andy -

On Tue, Feb 28, 2023 at 3:36 AM Deepak Michael
<dm...@proofpoint.com.invalid> wrote:

> Hi
>
> I have a muti-valued field, containing usernames, that I want to facet on,
> but I only want facets on a sub-set of those usernames.
> For example, if I have these docs
>
> doc1: { field1: [user1, user2, user4] }
> doc2: { field1: [user1, user3, user5], }
> doc3: { field1: [user3, user4, user6], }
>
> I'd like to be facet on field1 so that I get the following results:
>
> user1: 2
> user2: 1
>
> I tried having a query limiting the usernames and a facet on the usernames
> field, like so:
>
> {
>   "query": "field1:( user1 OR user2)",
>   "facet": { "specialUsers" : {"type": "terms", "field" : "field1"}}
> }
>
> but that returns facets for each of the values in (including user3, user4
> and user5).
>
> The unique values for usernames, across all documents in our index, could
> be in millions.
> I tried the `query` facet but that would mean having to add a `query`
> facet for each username to have a bucket on.
> I also see that the `prefix` option does something similar to what we want
> - which is slotting a bucket for each username that matched the
> criteria(prefix).
> In our case what's required is like a whitelist of field values for which
> the facets are to be included. We're looking at a custom faceting solution
> for this.
> With what’s available, is it possible to extend the JSON facets so that I
> can register a new custom facet processor? Or is there another way to do
> this?
>
> Thanks,
> Deepak
>

Re: Filtering facets

Posted by Michael Gibney <mi...@michaelgibney.net>.
This is one of the few remaining feature gaps (afaik) between legacy
facets and JSON facets. There's.a relevant Jira issue
(https://issues.apache.org/jira/browse/SOLR-14921) that summarizes the
state of things pretty well, including what I think would be a
workaround for your case (if a bit verbose): enumerating the terms of
interest in `query` subfacets, e.g:

    facet: {
      alfa: {
        type: query,
        q: "{!term f=symbol v=alfa}"
      },
      betta: {
        type: query,
        q: "{!term f=symbol v=alfa}"
      },
      with_space: {
        type: query,
        q: "{!term f=symbol v='with,with\',with space'}"
      }
    }

The workaround you tried will (as you found) not work on muti-valued
fields, but the above should (at the expense of verbosity, which won't
be practical if your term allowlist is in the millions). Then again,
if your allowlist numbers in the millions, how were you thinking to
filtering, in an ideal world? regex or something?

On Tue, Feb 28, 2023 at 6:03 AM Mikhail Khludnev <mk...@apache.org> wrote:
>
> Hello, Deepak.
> Old facets also has related parameters facet.excludeTerms, .facet.contains,
> facet.matches. but these are not fully fit to the problem.
> JSON Facets are not easy to extendable. Presumably it may be done by
> implementing extended FacetModule and deploying as a component plugin.
>
> On Tue, Feb 28, 2023 at 11:36 AM Deepak Michael
> <dm...@proofpoint.com.invalid> wrote:
>
> > Hi
> >
> > I have a muti-valued field, containing usernames, that I want to facet on,
> > but I only want facets on a sub-set of those usernames.
> > For example, if I have these docs
> >
> > doc1: { field1: [user1, user2, user4] }
> > doc2: { field1: [user1, user3, user5], }
> > doc3: { field1: [user3, user4, user6], }
> >
> > I'd like to be facet on field1 so that I get the following results:
> >
> > user1: 2
> > user2: 1
> >
> > I tried having a query limiting the usernames and a facet on the usernames
> > field, like so:
> >
> > {
> >   "query": "field1:( user1 OR user2)",
> >   "facet": { "specialUsers" : {"type": "terms", "field" : "field1"}}
> > }
> >
> > but that returns facets for each of the values in (including user3, user4
> > and user5).
> >
> > The unique values for usernames, across all documents in our index, could
> > be in millions.
> > I tried the `query` facet but that would mean having to add a `query`
> > facet for each username to have a bucket on.
> > I also see that the `prefix` option does something similar to what we want
> > - which is slotting a bucket for each username that matched the
> > criteria(prefix).
> > In our case what's required is like a whitelist of field values for which
> > the facets are to be included. We're looking at a custom faceting solution
> > for this.
> > With what’s available, is it possible to extend the JSON facets so that I
> > can register a new custom facet processor? Or is there another way to do
> > this?
> >
> > Thanks,
> > Deepak
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!

Re: Filtering facets

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello, Deepak.
Old facets also has related parameters facet.excludeTerms, .facet.contains,
facet.matches. but these are not fully fit to the problem.
JSON Facets are not easy to extendable. Presumably it may be done by
implementing extended FacetModule and deploying as a component plugin.

On Tue, Feb 28, 2023 at 11:36 AM Deepak Michael
<dm...@proofpoint.com.invalid> wrote:

> Hi
>
> I have a muti-valued field, containing usernames, that I want to facet on,
> but I only want facets on a sub-set of those usernames.
> For example, if I have these docs
>
> doc1: { field1: [user1, user2, user4] }
> doc2: { field1: [user1, user3, user5], }
> doc3: { field1: [user3, user4, user6], }
>
> I'd like to be facet on field1 so that I get the following results:
>
> user1: 2
> user2: 1
>
> I tried having a query limiting the usernames and a facet on the usernames
> field, like so:
>
> {
>   "query": "field1:( user1 OR user2)",
>   "facet": { "specialUsers" : {"type": "terms", "field" : "field1"}}
> }
>
> but that returns facets for each of the values in (including user3, user4
> and user5).
>
> The unique values for usernames, across all documents in our index, could
> be in millions.
> I tried the `query` facet but that would mean having to add a `query`
> facet for each username to have a bucket on.
> I also see that the `prefix` option does something similar to what we want
> - which is slotting a bucket for each username that matched the
> criteria(prefix).
> In our case what's required is like a whitelist of field values for which
> the facets are to be included. We're looking at a custom faceting solution
> for this.
> With what’s available, is it possible to extend the JSON facets so that I
> can register a new custom facet processor? Or is there another way to do
> this?
>
> Thanks,
> Deepak
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!