You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Gelszus Julia <j....@fis-gmbh.de> on 2019/02/01 06:30:44 UTC

Query over nested documents with an AND Operator

Hello,
recently I started to work with Apache SOLR 7.6.0 in the course of a project. I encountered the following problem during the queries, where I want to know if the SOLR offers this in general:
We want to index a nested XML document, which can have the following structure:
- there are 1-n Parent Documents which each contain the article data (short text, long text etc.)
-in addition, each Parent Document can have 0-n Child Documents containing the attributes of the article
-example of an XML file:
<doc>
  <field name="id">1</field>
  <field name="article">4711</field>
  <field name="shorttext">here is a short text dealing with plastic and brass</field>
  <field name="longtext">here is a detailed description</field>
  <field name="contenttype">parentDocument</field>
  </doc>
<doc>
  <field name="id">2</field>
  <field name="article">4811</field>
  <field name="shorttext">here is a shorttext</field>
  <field name="longtext">here you will find a detailed description</field>
  <field name="contenttype">parentDocument</field>
  <field name="attributes">
<doc>
  <field name="id">2_1</field>
  <field name="attributename">material </field>
  <field name="attributevalue">brass</field>
  </doc>
  <doc>
  <field name="id">2_2</field>
  <field name="attributename">material quality</field>
  <field name="attributevalue">plastic</field>
  </doc>
</field name="attributes">
</doc>
I need an AND operator between my queries because I want to get as accurate hits as possible. I managed to search all Parent and Child Documents with one search term and get the right result.
But if I want to search for example for plastic and brass (that means 2 or more search terms). I want to get both the Parent Document for the respective child document as result (article 4811), as well as article 4711 because in this article the two words appear in the description. But the result of my query is always only article 4711. I know that I could also write the attribute in one field. However, I want to have a facet about the attribute name.

I hope you can help me with this problem.

Thank you very much,

Mit freundlichen Grüßen / Kind regards

Julia Gelszus

Bachelor of Science
Consultant SAP Development Workbench

FIS Informationssysteme und Consulting GmbH
Röthleiner Weg 1
97506 Grafenrheinfeld

P +49 (9723) 9188-667
F +49 (9723) 9188-200
E j.gelszus@fis-gmbh.de
<ma...@fis-gmbh.de>www.fis-gmbh.de
<http://www.fis-gmbh.de>
Managing Directors:
Ralf Bernhardt, Wolfgang Ebner, Frank Schöngarth

Registration Office Schweinfurt HRB 2209

[cid:image001.jpg@01D4B9FB.5FCEDF70]<https://www.fis-gmbh.de/>  [cid:image002.jpg@01D4B9FB.5FCEDF70] <https://de-de.facebook.com/FISgmbh>   [cid:image003.jpg@01D4B9FB.5FCEDF70] <https://www.xing.com/companies/fisinformationssystemeundconsultinggmbh>   [cid:image004.jpg@01D4B9FB.5FCEDF70] <http://www.kununu.com/de/all/de/it/fis-informationssysteme-consulting>   [cid:image005.jpg@01D4B9FB.5FCEDF70] <https://www.youtube.com/channel/UC49711WwZ_tSIp_QnAWdeQA>


Re: Query over nested documents with an AND Operator

Posted by Scott Stults <ss...@opensourceconnections.com>.
Hi Julia,

Keep in mind that in order to facet on child document fields you'll need to
use the block join facet component:
https://lucene.apache.org/solr/guide/7_4/blockjoin-faceting.html

For the query itself you probably need to specify each required attribute
value, but looks like you're already heading down that path with the
facets. Add required local queries wrapped in the default query parser. The
local queries themselves would be block joins similar to this:

"+{!parent which=contenttype_s:parentDocument}attributevalue_s:brass
+{!parent which=contenttype_s:parentDocument}attributevalue_s:plastic"

That requires that a parent document satisfies both child document
constraints.

Also, if you want to return the child documents you'll need to use the
ChildDocTransformerFactory:
"fl=id,[child parentFilter=contenttype_s:parentDocument]"
(I'm not sure if that's required if you just want to facet on the child doc
values and not display the other fields.)

Hope that helps!

-Scott


On Fri, Feb 1, 2019 at 8:51 AM Mikhail Khludnev <mk...@apache.org> wrote:

> Whats' your current query? It's probably a question of building boolean
> query by combining Solr queries.
> Note, this datamodel might be a little bit overwhelming, So, if number of
> distinct attributename values is around a thousand, just handle it via
> dynamic field without nesting docs:
>
>
>   <field name="attributename_material ">brass</field>
>
>                 <field name="attributename_material quality>plastic</field>
>
>
>
> On Fri, Feb 1, 2019 at 3:35 PM Gelszus Julia <j....@fis-gmbh.de>
> wrote:
>
> > Hello,
> >
> > recently I started to work with Apache SOLR 7.6.0 in the course of a
> > project. I encountered the following problem during the queries, where I
> > want to know if the SOLR offers this in general:
> >
> > We want to index a nested XML document, which can have the following
> > structure:
> >
> > - there are 1-n Parent Documents which each contain the article data
> > (short text, long text etc.)
> >
> > -in addition, each Parent Document can have 0-n Child Documents
> containing
> > the attributes of the article
> >
> > -example of an XML file:
> >
> > <doc>
> >
> >   <field name="id">1</field>
> >
> >   <field name="article">4711</field>
> >
> >   <field name="shorttext">here is a short text dealing with plastic and
> > brass</field>
> >
> >   <field name="longtext">here is a detailed description</field>
> >
> >   <field name="contenttype">parentDocument</field>
> >
> >   </doc>
> >
> > <doc>
> >
> >   <field name="id">2</field>
> >
> >   <field name="article">4811</field>
> >
> >   <field name="shorttext">here is a shorttext</field>
> >
> >   <field name="longtext">here you will find a detailed
> description</field>
> >
> >   <field name="contenttype">parentDocument</field>
> >
> >   <field name="attributes">
> >
> > <doc>
> >
> >   <field name="id">2_1</field>
> >
> >   <field name="attributename">material </field>
> >
> >   <field name="attributevalue">brass</field>
> >
> >   </doc>
> >
> >   <doc>
> >
> >   <field name="id">2_2</field>
> >
> >   <field name="attributename">material quality</field>
> >
> >   <field name="attributevalue">plastic</field>
> >
> >   </doc>
> >
> > </field name="attributes">
> >
> > </doc>
> >
> > I need an AND operator between my queries because I want to get as
> > accurate hits as possible. I managed to search all Parent and Child
> > Documents with one search term and get the right result.
> >
> > But if I want to search for example for plastic and brass (that means 2
> or
> > more search terms). I want to get both the Parent Document for the
> > respective child document as result (article 4811), as well as article
> 4711
> > because in this article the two words appear in the description. But the
> > result of my query is always only article 4711. I know that I could also
> > write the attribute in one field. However, I want to have a facet about
> the
> > attribute name.
> >
> >
> >
> > I hope you can help me with this problem.
> >
> >
> >
> > Thank you very much,
> >
> >
> >
> > Mit freundlichen Grüßen / Kind regards
> >
> >
> > *Julia Gelszus *
> > Bachelor of Science
> > Consultant SAP Development Workbench
> >
> >
> > *FIS Informationssysteme und Consulting GmbH *Röthleiner Weg 1
> > 97506 Grafenrheinfeld
> >
> > P +49 (9723) 9188-667
> > F +49 (9723) 9188-200
> > E j.gelszus@fis-gmbh.de
> > www.fis-gmbh.de
> >
> > Managing Directors:
> > Ralf Bernhardt, Wolfgang Ebner, Frank Schöngarth
> >
> > Registration Office Schweinfurt HRB 2209
> >
> > <https://www.fis-gmbh.de/>  <https://de-de.facebook.com/FISgmbh>
> > <https://www.xing.com/companies/fisinformationssystemeundconsultinggmbh>
> > <http://www.kununu.com/de/all/de/it/fis-informationssysteme-consulting>
> > <https://www.youtube.com/channel/UC49711WwZ_tSIp_QnAWdeQA>
> >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com

Re: Query over nested documents with an AND Operator

Posted by Mikhail Khludnev <mk...@apache.org>.
Whats' your current query? It's probably a question of building boolean
query by combining Solr queries.
Note, this datamodel might be a little bit overwhelming, So, if number of
distinct attributename values is around a thousand, just handle it via
dynamic field without nesting docs:


  <field name="attributename_material ">brass</field>

                <field name="attributename_material quality>plastic</field>



On Fri, Feb 1, 2019 at 3:35 PM Gelszus Julia <j....@fis-gmbh.de> wrote:

> Hello,
>
> recently I started to work with Apache SOLR 7.6.0 in the course of a
> project. I encountered the following problem during the queries, where I
> want to know if the SOLR offers this in general:
>
> We want to index a nested XML document, which can have the following
> structure:
>
> - there are 1-n Parent Documents which each contain the article data
> (short text, long text etc.)
>
> -in addition, each Parent Document can have 0-n Child Documents containing
> the attributes of the article
>
> -example of an XML file:
>
> <doc>
>
>   <field name="id">1</field>
>
>   <field name="article">4711</field>
>
>   <field name="shorttext">here is a short text dealing with plastic and
> brass</field>
>
>   <field name="longtext">here is a detailed description</field>
>
>   <field name="contenttype">parentDocument</field>
>
>   </doc>
>
> <doc>
>
>   <field name="id">2</field>
>
>   <field name="article">4811</field>
>
>   <field name="shorttext">here is a shorttext</field>
>
>   <field name="longtext">here you will find a detailed description</field>
>
>   <field name="contenttype">parentDocument</field>
>
>   <field name="attributes">
>
> <doc>
>
>   <field name="id">2_1</field>
>
>   <field name="attributename">material </field>
>
>   <field name="attributevalue">brass</field>
>
>   </doc>
>
>   <doc>
>
>   <field name="id">2_2</field>
>
>   <field name="attributename">material quality</field>
>
>   <field name="attributevalue">plastic</field>
>
>   </doc>
>
> </field name="attributes">
>
> </doc>
>
> I need an AND operator between my queries because I want to get as
> accurate hits as possible. I managed to search all Parent and Child
> Documents with one search term and get the right result.
>
> But if I want to search for example for plastic and brass (that means 2 or
> more search terms). I want to get both the Parent Document for the
> respective child document as result (article 4811), as well as article 4711
> because in this article the two words appear in the description. But the
> result of my query is always only article 4711. I know that I could also
> write the attribute in one field. However, I want to have a facet about the
> attribute name.
>
>
>
> I hope you can help me with this problem.
>
>
>
> Thank you very much,
>
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> *Julia Gelszus *
> Bachelor of Science
> Consultant SAP Development Workbench
>
>
> *FIS Informationssysteme und Consulting GmbH *Röthleiner Weg 1
> 97506 Grafenrheinfeld
>
> P +49 (9723) 9188-667
> F +49 (9723) 9188-200
> E j.gelszus@fis-gmbh.de
> www.fis-gmbh.de
>
> Managing Directors:
> Ralf Bernhardt, Wolfgang Ebner, Frank Schöngarth
>
> Registration Office Schweinfurt HRB 2209
>
> <https://www.fis-gmbh.de/>  <https://de-de.facebook.com/FISgmbh>
> <https://www.xing.com/companies/fisinformationssystemeundconsultinggmbh>
> <http://www.kununu.com/de/all/de/it/fis-informationssysteme-consulting>
> <https://www.youtube.com/channel/UC49711WwZ_tSIp_QnAWdeQA>
>
>


-- 
Sincerely yours
Mikhail Khludnev