You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@solr.apache.org by Anthony Bouch <an...@infonomic.io> on 2022/02/05 12:45:37 UTC

Schema Design for EAV or Dynamic Facet Attributes

Hi All,

We're trying to create a schema that will allow us to provide facetted search for documents that may have an arbitrary set of attributes and values. In an RDBMS model, this would typically be an Entity Attribute Value scheme.

Here's an example of one type of document, with trait_type for Variant, Rank, Chest, Face, Hands.

{
  "name": "Name here.",
  "description": "Text here...,
  "attributes": [
    {
      "trait_type": "Variant",
      "value": "Service Unit"
    },
    {
      "trait_type": "Rank",
      "value": 5861
    },
    {
      "trait_type": "Chest",
      "value": "Robe",
      "occurrence": 10.0
    },
    {
      "trait_type": "Face",
      "value": "Shades",
      "occurrence": 9.8
    },
    {
      "trait_type": "Hands",
      "value": "Handheld Console",
      "occurrence": 2.6
    }
  ]
}

We have 10,000 documents that are of the above 'document type'.

Another document type may have different traits.

In the case of the document above, we'd like a facetted browser that looks like this:

Variant
  Service Unit(122)
  Combat Unit(100)
  Other (231)
Chest
  Robe (122)
  Amber (231)
  Diamond (100)
Face
  Shades(123)
  VR (23)
Hands
  Handheld Console (23)
  Sword (87)
  Gun (12)

etc...

As an experiment, we imported these documents into a 'schemaless' and managed schema core, and this is the resulting Solr document:


{
        "name":["Name here"],
        "description":["Description here..."],
        "attributes.trait_type":["Variant",
          "Rank",
          "Antenna",
          "Chest",
          "Face",
          "Head",
          "Hands"],
        "attributes.value":["Automaton",
          "309",
          "Boosted Signal",
          "Gold",
          "Glowering",
          "Baseball Cap",
          "Baseball Bat"],
        "attributes.occurrence":[8.7,
          3.4,
          9.6,
          4.7,
          4.7],
        "id":"4a972d2a-df0b-4112-90f2-eef8d153d874",
        "_version_":1723919631435956224},


There's an older Stackoverflow question and answer here, which describes a similar problem....

https://stackoverflow.com/questions/7512392/facet-dynamic-fields-with-apache-solr/14529566#14529566

... but I was wondering if anyone could suggest another approach? We're unlikely to have very many document types, and so a completely 'generalized' solution might not be necessary, i.e. we could 'normalize' the documents as they are indexed to known fields like variant, chest, face, hands - creating new fields as they might be required (and then filter by document type).

Thoughts or suggestions greatly appreciated.
  
Best,

Tony

Anthony Bouch
anthony@infonomic.io

Re: Schema Design for EAV or Dynamic Facet Attributes

Posted by Yonik Seeley <ys...@gmail.com>.

Depending on exactly what type of facet queries you need to do, you may be
able to get by with catenating "trait_type" and "value" and putting all of
the resulting values in a single field.

{
  "name": "Name here.",
  traits:["Variant=Service Unit", "Rank=5861", "Chest=Robe", ...]
}

Then after faceting on the "traits" field, the client side would need to
group everything by trait_type (by prefix.. i.e. "Variant=")
This would not allow for per field (or per trait_type) facet limits, or for
range faceting of course.

-Yonik


On Sat, Feb 5, 2022 at 7:46 AM Anthony Bouch <an...@infonomic.io> wrote:

> Hi All,
>
> We're trying to create a schema that will allow us to provide facetted
> search for documents that may have an arbitrary set of attributes and
> values. In an RDBMS model, this would typically be an Entity Attribute
> Value scheme.
>
> Here's an example of one type of document, with trait_type for Variant,
> Rank, Chest, Face, Hands.
>
> {
>   "name": "Name here.",
>   "description": "Text here...,
>   "attributes": [
>     {
>       "trait_type": "Variant",
>       "value": "Service Unit"
>     },
>     {
>       "trait_type": "Rank",
>       "value": 5861
>     },
>     {
>       "trait_type": "Chest",
>       "value": "Robe",
>       "occurrence": 10.0
>     },
>     {
>       "trait_type": "Face",
>       "value": "Shades",
>       "occurrence": 9.8
>     },
>     {
>       "trait_type": "Hands",
>       "value": "Handheld Console",
>       "occurrence": 2.6
>     }
>   ]
> }
>
> We have 10,000 documents that are of the above 'document type'.
>
> Another document type may have different traits.
>
> In the case of the document above, we'd like a facetted browser that looks
> like this:
>
> Variant
>   Service Unit(122)
>   Combat Unit(100)
>   Other (231)
> Chest
>   Robe (122)
>   Amber (231)
>   Diamond (100)
> Face
>   Shades(123)
>   VR (23)
> Hands
>   Handheld Console (23)
>   Sword (87)
>   Gun (12)
>
> etc...
>
> As an experiment, we imported these documents into a 'schemaless' and
> managed schema core, and this is the resulting Solr document:
>
>
> {
>         "name":["Name here"],
>         "description":["Description here..."],
>         "attributes.trait_type":["Variant",
>           "Rank",
>           "Antenna",
>           "Chest",
>           "Face",
>           "Head",
>           "Hands"],
>         "attributes.value":["Automaton",
>           "309",
>           "Boosted Signal",
>           "Gold",
>           "Glowering",
>           "Baseball Cap",
>           "Baseball Bat"],
>         "attributes.occurrence":[8.7,
>           3.4,
>           9.6,
>           4.7,
>           4.7],
>         "id":"4a972d2a-df0b-4112-90f2-eef8d153d874",
>         "_version_":1723919631435956224},
>
>
> There's an older Stackoverflow question and answer here, which describes a
> similar problem....
>
>
> https://stackoverflow.com/questions/7512392/facet-dynamic-fields-with-apache-solr/14529566#14529566
>
> ... but I was wondering if anyone could suggest another approach? We're
> unlikely to have very many document types, and so a completely
> 'generalized' solution might not be necessary, i.e. we could 'normalize'
> the documents as they are indexed to known fields like variant, chest,
> face, hands - creating new fields as they might be required (and then
> filter by document type).
>
> Thoughts or suggestions greatly appreciated.
>
> Best,
>
> Tony
>
> Anthony Bouch
> anthony@infonomic.io
>
>