You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bjørn Axelsen <bj...@fagkommunikation.dk> on 2014/07/16 17:01:16 UTC

Mixing ordinary and nested documents

Hi Solr users

I would appreciate your inputs on how to handle a *mix *of *simple *and *nested
*documents in the most easy and flexible way.

I need to handle:

   - simple documens: webpages, short articles etc. (approx. 90% of the
   content)
   - nested documents: books containing chapters etc. (approx 10% of the
   content)

For simple documents I just want to present straightforward search results
without any grouping etc.

For the nested documents I want to group by book and show book title, book
price etc. AND the individual results within the book. Lets say there is a
hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on "Article
1", I would like to present this:

*Book 1 title*
Book 1 published date
Book 1 description
- *Chapter 1 title*
  Chapter 1 snippet
- *Chapter 7 title*
  CHapter 7 snippet

*Article 1 title*
Article 1 published date
Article 1 description
Article 1 snippet

It looks like it is pretty straightforward to use the CollapsingQParser to
collapse the book results into one result and not to collapse the other
results. But how about showing the information about the book (the parent
document of the chapters)?

1) Is there a way to do an* optional block join* to a *parent *document and
return it together *with *the *child *document - but not to require a
parent document?

- or -

2) Do I need to require parent-child documents for everything? This is
really not my preferred strategy as only a small part of the documents is
in a real parent-child relationship. This would mean a lot of dummy child
documents.

- or -

3) Should I just denormalize data and include the book information within
each chapter document?

- or -

4) ... or is there a smarter way?

Your help is very much appreciated.

Cheers,

Bjørn Axelsen

Re: Mixing ordinary and nested documents

Posted by Bjørn Axelsen <bj...@fagkommunikation.dk>.
thank you very much :-)

2014-07-22 16:34 GMT+02:00 Umesh Prasad <um...@gmail.com>:

>     public static DocSet mapChildDocsToParentOnly(DocSet childDocSet) {
>
>         DocSet mappedParentDocSet = new BitDocSet();
>         DocIterator childIterator = childDocSet.iterator();
>         while (childIterator.hasNext()) {
>             int childDoc = childIterator.nextDoc();
>             int parentDoc = childToParentDocMapping[childDoc];
>             mappedParentDocSet.addUnique(parentDoc);
>         }
>         int[] matches = new int[mappedParentDocSet.size()];
>         DocIterator parentIter = mappedParentDocSet.iterator();
>         for (int i = 0; parentIter.hasNext(); i++) {
>             matches[i] = parentIter.nextDoc();
>         }
>         return new SortedIntDocSet(matches); // you will need
> SortedIntDocSet impl else docset interaction in some facet queries fails
> later.
>     }
>
>
>
> On 22 July 2014 19:59, Umesh Prasad <um...@gmail.com> wrote:
>
> > Query parentFilterQuery = new TermQuery(new Term("document_type",
> >             "parent"));
> >
> >             int[] childToParentDocMapping = new int[searcher.maxDoc()];
> >             DocSet allParentDocSet =
> searcher.getDocSet(parentFilterQuery);
> >             DocIterator iter = allParentDocSet.iterator();
> >             int child = 0;
> >             while (iter.hasNext()) {
> >                 int parent = iter.nextDoc();
> >                 while (child <= parent) {
> >                     childToParentDocMapping[child] = parent;
> >                     child++;
> >                 }
> >             }
> >
> >
> > On 22 July 2014 16:28, Bjørn Axelsen <bj...@fagkommunikation.dk>
> > wrote:
> >
> >> Thanks, Umesh
> >>
> >> You can get the parent bitset by running a the parent doc type query on
> >> > the solr indexsearcher.
> >> > Then child bitset by runnning the child doc type query. Then  use
> these
> >> > together to create a int[] where int[i] = parent of i.
> >> >
> >>
> >> Can you kindly add an example? I am not quite sure how to put this into
> a
> >> query?
> >>
> >> I can easily make the join from child to parent, but what I want to
> >> achieve
> >> is to get the parent document added to the result if it exists but
> >> maintain
> >> the scoring fromt the child as well as the full child document. Is this
> >> possible?
> >>
> >> Cheers,
> >> Bjørn
> >>
> >> 2014-07-18 19:00 GMT+02:00 Umesh Prasad <um...@gmail.com>:
> >>
> >> > Comments inline
> >> >
> >> >
> >> > On 16 July 2014 20:31, Bjørn Axelsen <
> bjorn.axelsen@fagkommunikation.dk
> >> >
> >> > wrote:
> >> >
> >> > > Hi Solr users
> >> > >
> >> > > I would appreciate your inputs on how to handle a *mix *of *simple
> >> *and
> >> > > *nested
> >> > > *documents in the most easy and flexible way.
> >> > >
> >> > > I need to handle:
> >> > >
> >> > >    - simple documens: webpages, short articles etc. (approx. 90% of
> >> the
> >> > >    content)
> >> > >    - nested documents: books containing chapters etc. (approx 10% of
> >> the
> >> > >    content)
> >> > >
> >> > >
> >> >
> >> >
> >> > > For simple documents I just want to present straightforward search
> >> > results
> >> > > without any grouping etc.
> >> > >
> >> > > For the nested documents I want to group by book and show book
> title,
> >> > book
> >> > > price etc. AND the individual results within the book. Lets say
> there
> >> is
> >> > a
> >> > > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on
> >> "Article
> >> > > 1", I would like to present this:
> >> > >
> >> > > *Book 1 title*
> >> > > Book 1 published date
> >> > > Book 1 description
> >> > > - *Chapter 1 title*
> >> > >   Chapter 1 snippet
> >> > > - *Chapter 7 title*
> >> > >   CHapter 7 snippet
> >> > >
> >> > > *Article 1 title*
> >> > > Article 1 published date
> >> > > Article 1 description
> >> > > Article 1 snippet
> >> > >
> >> > > It looks like it is pretty straightforward to use the
> >> CollapsingQParser
> >> > to
> >> > > collapse the book results into one result and not to collapse the
> >> other
> >> > > results. But how about showing the information about the book (the
> >> parent
> >> > > document of the chapters)?
> >> > >
> >> >
> >> > You can map the child document to parent  doc id space and extract the
> >> > information from parent doc id.
> >> >
> >> > First you need to generate child doc to parent doc id mapping one
> time.
> >> >   You can get the parent bitset by running a the parent doc type query
> >> on
> >> > the solr indexsearcher.
> >> > Then child bitset by runnning the child doc type query. Then  use
> these
> >> > together to create a int[] where int[i] = parent of i. This result is
> >> > cachable till next commit. I am doing that for computing facets from
> >> fields
> >> > in parent docs and sorting on values from parent docs (while getting
> >> child
> >> > docs as output).
> >> >
> >> >
> >> >
> >> >
> >> > > 1) Is there a way to do an* optional block join* to a *parent
> >> *document
> >> > and
> >> > > return it together *with *the *child *document - but not to require
> a
> >> > > parent document?
> >> > >
> >> > > - or -
> >> > >
> >> > > 2) Do I need to require parent-child documents for everything? This
> is
> >> > > really not my preferred strategy as only a small part of the
> >> documents is
> >> > > in a real parent-child relationship. This would mean a lot of dummy
> >> child
> >> > > documents.
> >> > >
> >> > >
> >> >
> >> > >
> >> > > - or -
> >> > >
> >> > > 3) Should I just denormalize data and include the book information
> >> within
> >> > > each chapter document?
> >> > >
> >> > > - or -
> >> > >
> >> > > 4) ... or is there a smarter way?
> >> > >
> >> > > Your help is very much appreciated.
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Bjørn Axelsen
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > ---
> >> > Thanks & Regards
> >> > Umesh Prasad
> >> >
> >>
> >
> >
> >
> > --
> > ---
> > Thanks & Regards
> > Umesh Prasad
> >
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>

Re: Mixing ordinary and nested documents

Posted by Umesh Prasad <um...@gmail.com>.
    public static DocSet mapChildDocsToParentOnly(DocSet childDocSet) {

        DocSet mappedParentDocSet = new BitDocSet();
        DocIterator childIterator = childDocSet.iterator();
        while (childIterator.hasNext()) {
            int childDoc = childIterator.nextDoc();
            int parentDoc = childToParentDocMapping[childDoc];
            mappedParentDocSet.addUnique(parentDoc);
        }
        int[] matches = new int[mappedParentDocSet.size()];
        DocIterator parentIter = mappedParentDocSet.iterator();
        for (int i = 0; parentIter.hasNext(); i++) {
            matches[i] = parentIter.nextDoc();
        }
        return new SortedIntDocSet(matches); // you will need
SortedIntDocSet impl else docset interaction in some facet queries fails
later.
    }



On 22 July 2014 19:59, Umesh Prasad <um...@gmail.com> wrote:

> Query parentFilterQuery = new TermQuery(new Term("document_type",
>             "parent"));
>
>             int[] childToParentDocMapping = new int[searcher.maxDoc()];
>             DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery);
>             DocIterator iter = allParentDocSet.iterator();
>             int child = 0;
>             while (iter.hasNext()) {
>                 int parent = iter.nextDoc();
>                 while (child <= parent) {
>                     childToParentDocMapping[child] = parent;
>                     child++;
>                 }
>             }
>
>
> On 22 July 2014 16:28, Bjørn Axelsen <bj...@fagkommunikation.dk>
> wrote:
>
>> Thanks, Umesh
>>
>> You can get the parent bitset by running a the parent doc type query on
>> > the solr indexsearcher.
>> > Then child bitset by runnning the child doc type query. Then  use these
>> > together to create a int[] where int[i] = parent of i.
>> >
>>
>> Can you kindly add an example? I am not quite sure how to put this into a
>> query?
>>
>> I can easily make the join from child to parent, but what I want to
>> achieve
>> is to get the parent document added to the result if it exists but
>> maintain
>> the scoring fromt the child as well as the full child document. Is this
>> possible?
>>
>> Cheers,
>> Bjørn
>>
>> 2014-07-18 19:00 GMT+02:00 Umesh Prasad <um...@gmail.com>:
>>
>> > Comments inline
>> >
>> >
>> > On 16 July 2014 20:31, Bjørn Axelsen <bjorn.axelsen@fagkommunikation.dk
>> >
>> > wrote:
>> >
>> > > Hi Solr users
>> > >
>> > > I would appreciate your inputs on how to handle a *mix *of *simple
>> *and
>> > > *nested
>> > > *documents in the most easy and flexible way.
>> > >
>> > > I need to handle:
>> > >
>> > >    - simple documens: webpages, short articles etc. (approx. 90% of
>> the
>> > >    content)
>> > >    - nested documents: books containing chapters etc. (approx 10% of
>> the
>> > >    content)
>> > >
>> > >
>> >
>> >
>> > > For simple documents I just want to present straightforward search
>> > results
>> > > without any grouping etc.
>> > >
>> > > For the nested documents I want to group by book and show book title,
>> > book
>> > > price etc. AND the individual results within the book. Lets say there
>> is
>> > a
>> > > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on
>> "Article
>> > > 1", I would like to present this:
>> > >
>> > > *Book 1 title*
>> > > Book 1 published date
>> > > Book 1 description
>> > > - *Chapter 1 title*
>> > >   Chapter 1 snippet
>> > > - *Chapter 7 title*
>> > >   CHapter 7 snippet
>> > >
>> > > *Article 1 title*
>> > > Article 1 published date
>> > > Article 1 description
>> > > Article 1 snippet
>> > >
>> > > It looks like it is pretty straightforward to use the
>> CollapsingQParser
>> > to
>> > > collapse the book results into one result and not to collapse the
>> other
>> > > results. But how about showing the information about the book (the
>> parent
>> > > document of the chapters)?
>> > >
>> >
>> > You can map the child document to parent  doc id space and extract the
>> > information from parent doc id.
>> >
>> > First you need to generate child doc to parent doc id mapping one time.
>> >   You can get the parent bitset by running a the parent doc type query
>> on
>> > the solr indexsearcher.
>> > Then child bitset by runnning the child doc type query. Then  use these
>> > together to create a int[] where int[i] = parent of i. This result is
>> > cachable till next commit. I am doing that for computing facets from
>> fields
>> > in parent docs and sorting on values from parent docs (while getting
>> child
>> > docs as output).
>> >
>> >
>> >
>> >
>> > > 1) Is there a way to do an* optional block join* to a *parent
>> *document
>> > and
>> > > return it together *with *the *child *document - but not to require a
>> > > parent document?
>> > >
>> > > - or -
>> > >
>> > > 2) Do I need to require parent-child documents for everything? This is
>> > > really not my preferred strategy as only a small part of the
>> documents is
>> > > in a real parent-child relationship. This would mean a lot of dummy
>> child
>> > > documents.
>> > >
>> > >
>> >
>> > >
>> > > - or -
>> > >
>> > > 3) Should I just denormalize data and include the book information
>> within
>> > > each chapter document?
>> > >
>> > > - or -
>> > >
>> > > 4) ... or is there a smarter way?
>> > >
>> > > Your help is very much appreciated.
>> > >
>> > > Cheers,
>> > >
>> > > Bjørn Axelsen
>> > >
>> >
>> >
>> >
>> > --
>> > ---
>> > Thanks & Regards
>> > Umesh Prasad
>> >
>>
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>



-- 
---
Thanks & Regards
Umesh Prasad

Re: Mixing ordinary and nested documents

Posted by Umesh Prasad <um...@gmail.com>.
Query parentFilterQuery = new TermQuery(new Term("document_type",
            "parent"));

            int[] childToParentDocMapping = new int[searcher.maxDoc()];
            DocSet allParentDocSet = searcher.getDocSet(parentFilterQuery);
            DocIterator iter = allParentDocSet.iterator();
            int child = 0;
            while (iter.hasNext()) {
                int parent = iter.nextDoc();
                while (child <= parent) {
                    childToParentDocMapping[child] = parent;
                    child++;
                }
            }


On 22 July 2014 16:28, Bjørn Axelsen <bj...@fagkommunikation.dk>
wrote:

> Thanks, Umesh
>
> You can get the parent bitset by running a the parent doc type query on
> > the solr indexsearcher.
> > Then child bitset by runnning the child doc type query. Then  use these
> > together to create a int[] where int[i] = parent of i.
> >
>
> Can you kindly add an example? I am not quite sure how to put this into a
> query?
>
> I can easily make the join from child to parent, but what I want to achieve
> is to get the parent document added to the result if it exists but maintain
> the scoring fromt the child as well as the full child document. Is this
> possible?
>
> Cheers,
> Bjørn
>
> 2014-07-18 19:00 GMT+02:00 Umesh Prasad <um...@gmail.com>:
>
> > Comments inline
> >
> >
> > On 16 July 2014 20:31, Bjørn Axelsen <bj...@fagkommunikation.dk>
> > wrote:
> >
> > > Hi Solr users
> > >
> > > I would appreciate your inputs on how to handle a *mix *of *simple *and
> > > *nested
> > > *documents in the most easy and flexible way.
> > >
> > > I need to handle:
> > >
> > >    - simple documens: webpages, short articles etc. (approx. 90% of the
> > >    content)
> > >    - nested documents: books containing chapters etc. (approx 10% of
> the
> > >    content)
> > >
> > >
> >
> >
> > > For simple documents I just want to present straightforward search
> > results
> > > without any grouping etc.
> > >
> > > For the nested documents I want to group by book and show book title,
> > book
> > > price etc. AND the individual results within the book. Lets say there
> is
> > a
> > > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on
> "Article
> > > 1", I would like to present this:
> > >
> > > *Book 1 title*
> > > Book 1 published date
> > > Book 1 description
> > > - *Chapter 1 title*
> > >   Chapter 1 snippet
> > > - *Chapter 7 title*
> > >   CHapter 7 snippet
> > >
> > > *Article 1 title*
> > > Article 1 published date
> > > Article 1 description
> > > Article 1 snippet
> > >
> > > It looks like it is pretty straightforward to use the CollapsingQParser
> > to
> > > collapse the book results into one result and not to collapse the other
> > > results. But how about showing the information about the book (the
> parent
> > > document of the chapters)?
> > >
> >
> > You can map the child document to parent  doc id space and extract the
> > information from parent doc id.
> >
> > First you need to generate child doc to parent doc id mapping one time.
> >   You can get the parent bitset by running a the parent doc type query on
> > the solr indexsearcher.
> > Then child bitset by runnning the child doc type query. Then  use these
> > together to create a int[] where int[i] = parent of i. This result is
> > cachable till next commit. I am doing that for computing facets from
> fields
> > in parent docs and sorting on values from parent docs (while getting
> child
> > docs as output).
> >
> >
> >
> >
> > > 1) Is there a way to do an* optional block join* to a *parent *document
> > and
> > > return it together *with *the *child *document - but not to require a
> > > parent document?
> > >
> > > - or -
> > >
> > > 2) Do I need to require parent-child documents for everything? This is
> > > really not my preferred strategy as only a small part of the documents
> is
> > > in a real parent-child relationship. This would mean a lot of dummy
> child
> > > documents.
> > >
> > >
> >
> > >
> > > - or -
> > >
> > > 3) Should I just denormalize data and include the book information
> within
> > > each chapter document?
> > >
> > > - or -
> > >
> > > 4) ... or is there a smarter way?
> > >
> > > Your help is very much appreciated.
> > >
> > > Cheers,
> > >
> > > Bjørn Axelsen
> > >
> >
> >
> >
> > --
> > ---
> > Thanks & Regards
> > Umesh Prasad
> >
>



-- 
---
Thanks & Regards
Umesh Prasad

Re: Mixing ordinary and nested documents

Posted by Bjørn Axelsen <bj...@fagkommunikation.dk>.
Thanks, Umesh

You can get the parent bitset by running a the parent doc type query on
> the solr indexsearcher.
> Then child bitset by runnning the child doc type query. Then  use these
> together to create a int[] where int[i] = parent of i.
>

Can you kindly add an example? I am not quite sure how to put this into a
query?

I can easily make the join from child to parent, but what I want to achieve
is to get the parent document added to the result if it exists but maintain
the scoring fromt the child as well as the full child document. Is this
possible?

Cheers,
Bjørn

2014-07-18 19:00 GMT+02:00 Umesh Prasad <um...@gmail.com>:

> Comments inline
>
>
> On 16 July 2014 20:31, Bjørn Axelsen <bj...@fagkommunikation.dk>
> wrote:
>
> > Hi Solr users
> >
> > I would appreciate your inputs on how to handle a *mix *of *simple *and
> > *nested
> > *documents in the most easy and flexible way.
> >
> > I need to handle:
> >
> >    - simple documens: webpages, short articles etc. (approx. 90% of the
> >    content)
> >    - nested documents: books containing chapters etc. (approx 10% of the
> >    content)
> >
> >
>
>
> > For simple documents I just want to present straightforward search
> results
> > without any grouping etc.
> >
> > For the nested documents I want to group by book and show book title,
> book
> > price etc. AND the individual results within the book. Lets say there is
> a
> > hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on "Article
> > 1", I would like to present this:
> >
> > *Book 1 title*
> > Book 1 published date
> > Book 1 description
> > - *Chapter 1 title*
> >   Chapter 1 snippet
> > - *Chapter 7 title*
> >   CHapter 7 snippet
> >
> > *Article 1 title*
> > Article 1 published date
> > Article 1 description
> > Article 1 snippet
> >
> > It looks like it is pretty straightforward to use the CollapsingQParser
> to
> > collapse the book results into one result and not to collapse the other
> > results. But how about showing the information about the book (the parent
> > document of the chapters)?
> >
>
> You can map the child document to parent  doc id space and extract the
> information from parent doc id.
>
> First you need to generate child doc to parent doc id mapping one time.
>   You can get the parent bitset by running a the parent doc type query on
> the solr indexsearcher.
> Then child bitset by runnning the child doc type query. Then  use these
> together to create a int[] where int[i] = parent of i. This result is
> cachable till next commit. I am doing that for computing facets from fields
> in parent docs and sorting on values from parent docs (while getting child
> docs as output).
>
>
>
>
> > 1) Is there a way to do an* optional block join* to a *parent *document
> and
> > return it together *with *the *child *document - but not to require a
> > parent document?
> >
> > - or -
> >
> > 2) Do I need to require parent-child documents for everything? This is
> > really not my preferred strategy as only a small part of the documents is
> > in a real parent-child relationship. This would mean a lot of dummy child
> > documents.
> >
> >
>
> >
> > - or -
> >
> > 3) Should I just denormalize data and include the book information within
> > each chapter document?
> >
> > - or -
> >
> > 4) ... or is there a smarter way?
> >
> > Your help is very much appreciated.
> >
> > Cheers,
> >
> > Bjørn Axelsen
> >
>
>
>
> --
> ---
> Thanks & Regards
> Umesh Prasad
>

Re: Mixing ordinary and nested documents

Posted by Umesh Prasad <um...@gmail.com>.
Comments inline


On 16 July 2014 20:31, Bjørn Axelsen <bj...@fagkommunikation.dk>
wrote:

> Hi Solr users
>
> I would appreciate your inputs on how to handle a *mix *of *simple *and
> *nested
> *documents in the most easy and flexible way.
>
> I need to handle:
>
>    - simple documens: webpages, short articles etc. (approx. 90% of the
>    content)
>    - nested documents: books containing chapters etc. (approx 10% of the
>    content)
>
>


> For simple documents I just want to present straightforward search results
> without any grouping etc.
>
> For the nested documents I want to group by book and show book title, book
> price etc. AND the individual results within the book. Lets say there is a
> hit on "Chapters 1" and "Chapter 7" within "Book 1" and a hit on "Article
> 1", I would like to present this:
>
> *Book 1 title*
> Book 1 published date
> Book 1 description
> - *Chapter 1 title*
>   Chapter 1 snippet
> - *Chapter 7 title*
>   CHapter 7 snippet
>
> *Article 1 title*
> Article 1 published date
> Article 1 description
> Article 1 snippet
>
> It looks like it is pretty straightforward to use the CollapsingQParser to
> collapse the book results into one result and not to collapse the other
> results. But how about showing the information about the book (the parent
> document of the chapters)?
>

You can map the child document to parent  doc id space and extract the
information from parent doc id.

First you need to generate child doc to parent doc id mapping one time.
  You can get the parent bitset by running a the parent doc type query on
the solr indexsearcher.
Then child bitset by runnning the child doc type query. Then  use these
together to create a int[] where int[i] = parent of i. This result is
cachable till next commit. I am doing that for computing facets from fields
in parent docs and sorting on values from parent docs (while getting child
docs as output).




> 1) Is there a way to do an* optional block join* to a *parent *document and
> return it together *with *the *child *document - but not to require a
> parent document?
>
> - or -
>
> 2) Do I need to require parent-child documents for everything? This is
> really not my preferred strategy as only a small part of the documents is
> in a real parent-child relationship. This would mean a lot of dummy child
> documents.
>
>

>
> - or -
>
> 3) Should I just denormalize data and include the book information within
> each chapter document?
>
> - or -
>
> 4) ... or is there a smarter way?
>
> Your help is very much appreciated.
>
> Cheers,
>
> Bjørn Axelsen
>



-- 
---
Thanks & Regards
Umesh Prasad