You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Charlie Hubbard <ch...@gmail.com> on 2023/04/10 16:12:16 UTC

Issue indexing nested documents

Hi

I'm encountering the following error when indexing a parent and nested
children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj to
send the documents up to Solr for indexing.  I have the default `_root_`
field defined in the schema, and the parent document can be indexed without
issue.  It's when I include the field `pages` that Solr complains.

1040R.pdf was rejected by the server for Error from server at
http://localhost:8983/solr/mycollection: ERROR:
[doc=08464756-4ecd-4758-b8cc-9575d8a922ce] multiple values encountered for
non multiValued field pages: [SolrInputDocument(fields:

Here is the structure I'm trying to upload (this is
psuedocode/JSON-ish representation to help represent data, their data
types, and structure.  This is NOT literally what I'm sending).

SolrInputDocument {
   id: '08464756-4ecd-4758-b8cc-9575d8a922ce',
   archiveDate_dt: "2023-04-08T12:23:43Z",
   _batchId: 251
   _type: "document",
   ....
   *pages*: [
      SolrInputDocument {
         id: "1c482d15-6dd2-4bb2-8583-59231aa8db9b",
         archiveDate_dt: "2023-04-08T12:23:43Z",
         _batchId: 251
         _pageNumber: 1,
         _type: "page",
         content: "lorem ipsum dolor...."
      },
      SolrInputDocument {
         ...
       },
       ...
   ]
}

It clearly doesn't like the field `pages` which isn't defined in the
schema, but reading the various documents from Solr seems like it doesn't
need to be?  That Solr should recognize the nested documents contained
within the embedded List and know what to do.  Do I have that right?  I'm
piecing together information from several versions of Solr because the
6.6.6 docs are pretty sparse about how things work, and later versions do a
better job explaining it.

What am I doing wrong?

Thanks in advance
Charlie

Re: Issue indexing nested documents

Posted by Charlie Hubbard <ch...@gmail.com>.
Hi,

Thanks for that background information.  I don't really need separate lists
at the moment.  I can make do with these limitations of 6.6.6.

Thanks again
Charlie

On Tue, Apr 11, 2023 at 5:07 AM Mikhail Khludnev <mk...@apache.org> wrote:

> I suppose you are looking for functionality introduced at 7.5
> https://issues.apache.org/jira/browse/SOLR-12361 which I call named
> children.
> Before 7.5 it was only single anonymous children list/dimension:
> SID.addChildDocuments().
> Do you really need to distinguish children between a few lists (named
> fields) at 6.6?
>
>
> On Tue, Apr 11, 2023 at 4:57 AM Charlie Hubbard <charlie.hubbard@gmail.com
> >
> wrote:
>
> > Hi,
> >
> > I'm not sure what you mean by "naming children" exactly.  I don't have a
> > stack trace, but this video discusses nested documents, and refers to
> > enhanced support in 6.x so I know 6.x supports nested documents.
> >
> > https://youtu.be/qV0fIg-LGBE?t=446
> >
> > Here is what is in the Solr 8 manual:
> >
> > >
> > >    - Even though child documents are provided as field values
> > >    syntactically and with SolrJ, it’s a matter of syntax and it isn’t
> an
> > >    actual field in the schema. Consequently, the field need not be
> > defined in
> > >    the schema and probably shouldn’t be as it would be confusing. There
> > is no
> > >    child document field type, at least not yet.
> > >
> > > And in the Solr 6.6 manual:
> >
> > > Nested documents may be indexed via either the XML or JSON data syntax
> > (or
> > > using SolrJ)
> > > <https://solr.apache.org/guide/6_6/using-solrj.html#using-solrj>
> >
> >
> > However, no such example using SolrJ is included, leading me to look to
> > other versions that document the feature better.  Solr 6.6 manual does
> > refer to the special field in JSON indexing `_childDocuments_`.  It's not
> > until Solr 8.7 manual does it provide an example of using Solrj to index
> > nested child documents.
> >
> > https://solr.apache.org/guide/8_7/indexing-nested-documents.html
> >
> > The first example using SolrJ simply sets a field with a List of
> > SolrInputDocuments which is what I'm doing.  The 2nd uses a method on
> > SolrInputDocument.addChildDocuments(Collection<SolrInputDocument>).  I
> see
> > that method available on SolrJ for 6.x so I'm assuming that is supported.
> > I'm just curious if someone knows if the first method should be supported
> > or not?
> >
> > Charlie
> >
> > On Mon, Apr 10, 2023 at 3:36 PM Mikhail Khludnev <mk...@apache.org>
> wrote:
> >
> > > Hello Charlie.
> > > My (just) guess is that the old version might not support naming
> > children,
> > > unless you can prove it.
> > > If you share stacktrace with method names and line numbers it might
> give
> > a
> > > clue regarding support for this functionality.
> > >
> > > On Mon, Apr 10, 2023 at 7:12 PM Charlie Hubbard <
> > charlie.hubbard@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > I'm encountering the following error when indexing a parent and
> nested
> > > > children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj
> > to
> > > > send the documents up to Solr for indexing.  I have the default
> > `_root_`
> > > > field defined in the schema, and the parent document can be indexed
> > > without
> > > > issue.  It's when I include the field `pages` that Solr complains.
> > > >
> > > > 1040R.pdf was rejected by the server for Error from server at
> > > > http://localhost:8983/solr/mycollection: ERROR:
> > > > [doc=08464756-4ecd-4758-b8cc-9575d8a922ce] multiple values
> encountered
> > > for
> > > > non multiValued field pages: [SolrInputDocument(fields:
> > > >
> > > > Here is the structure I'm trying to upload (this is
> > > > psuedocode/JSON-ish representation to help represent data, their data
> > > > types, and structure.  This is NOT literally what I'm sending).
> > > >
> > > > SolrInputDocument {
> > > >    id: '08464756-4ecd-4758-b8cc-9575d8a922ce',
> > > >    archiveDate_dt: "2023-04-08T12:23:43Z",
> > > >    _batchId: 251
> > > >    _type: "document",
> > > >    ....
> > > >    *pages*: [
> > > >       SolrInputDocument {
> > > >          id: "1c482d15-6dd2-4bb2-8583-59231aa8db9b",
> > > >          archiveDate_dt: "2023-04-08T12:23:43Z",
> > > >          _batchId: 251
> > > >          _pageNumber: 1,
> > > >          _type: "page",
> > > >          content: "lorem ipsum dolor...."
> > > >       },
> > > >       SolrInputDocument {
> > > >          ...
> > > >        },
> > > >        ...
> > > >    ]
> > > > }
> > > >
> > > > It clearly doesn't like the field `pages` which isn't defined in the
> > > > schema, but reading the various documents from Solr seems like it
> > doesn't
> > > > need to be?  That Solr should recognize the nested documents
> contained
> > > > within the embedded List and know what to do.  Do I have that right?
> > I'm
> > > > piecing together information from several versions of Solr because
> the
> > > > 6.6.6 docs are pretty sparse about how things work, and later
> versions
> > > do a
> > > > better job explaining it.
> > > >
> > > > What am I doing wrong?
> > > >
> > > > Thanks in advance
> > > > Charlie
> > > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > https://t.me/MUST_SEARCH
> > > A caveat: Cyrillic!
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>

Re: Issue indexing nested documents

Posted by Mikhail Khludnev <mk...@apache.org>.
I suppose you are looking for functionality introduced at 7.5
https://issues.apache.org/jira/browse/SOLR-12361 which I call named
children.
Before 7.5 it was only single anonymous children list/dimension:
SID.addChildDocuments().
Do you really need to distinguish children between a few lists (named
fields) at 6.6?


On Tue, Apr 11, 2023 at 4:57 AM Charlie Hubbard <ch...@gmail.com>
wrote:

> Hi,
>
> I'm not sure what you mean by "naming children" exactly.  I don't have a
> stack trace, but this video discusses nested documents, and refers to
> enhanced support in 6.x so I know 6.x supports nested documents.
>
> https://youtu.be/qV0fIg-LGBE?t=446
>
> Here is what is in the Solr 8 manual:
>
> >
> >    - Even though child documents are provided as field values
> >    syntactically and with SolrJ, it’s a matter of syntax and it isn’t an
> >    actual field in the schema. Consequently, the field need not be
> defined in
> >    the schema and probably shouldn’t be as it would be confusing. There
> is no
> >    child document field type, at least not yet.
> >
> > And in the Solr 6.6 manual:
>
> > Nested documents may be indexed via either the XML or JSON data syntax
> (or
> > using SolrJ)
> > <https://solr.apache.org/guide/6_6/using-solrj.html#using-solrj>
>
>
> However, no such example using SolrJ is included, leading me to look to
> other versions that document the feature better.  Solr 6.6 manual does
> refer to the special field in JSON indexing `_childDocuments_`.  It's not
> until Solr 8.7 manual does it provide an example of using Solrj to index
> nested child documents.
>
> https://solr.apache.org/guide/8_7/indexing-nested-documents.html
>
> The first example using SolrJ simply sets a field with a List of
> SolrInputDocuments which is what I'm doing.  The 2nd uses a method on
> SolrInputDocument.addChildDocuments(Collection<SolrInputDocument>).  I see
> that method available on SolrJ for 6.x so I'm assuming that is supported.
> I'm just curious if someone knows if the first method should be supported
> or not?
>
> Charlie
>
> On Mon, Apr 10, 2023 at 3:36 PM Mikhail Khludnev <mk...@apache.org> wrote:
>
> > Hello Charlie.
> > My (just) guess is that the old version might not support naming
> children,
> > unless you can prove it.
> > If you share stacktrace with method names and line numbers it might give
> a
> > clue regarding support for this functionality.
> >
> > On Mon, Apr 10, 2023 at 7:12 PM Charlie Hubbard <
> charlie.hubbard@gmail.com
> > >
> > wrote:
> >
> > > Hi
> > >
> > > I'm encountering the following error when indexing a parent and nested
> > > children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj
> to
> > > send the documents up to Solr for indexing.  I have the default
> `_root_`
> > > field defined in the schema, and the parent document can be indexed
> > without
> > > issue.  It's when I include the field `pages` that Solr complains.
> > >
> > > 1040R.pdf was rejected by the server for Error from server at
> > > http://localhost:8983/solr/mycollection: ERROR:
> > > [doc=08464756-4ecd-4758-b8cc-9575d8a922ce] multiple values encountered
> > for
> > > non multiValued field pages: [SolrInputDocument(fields:
> > >
> > > Here is the structure I'm trying to upload (this is
> > > psuedocode/JSON-ish representation to help represent data, their data
> > > types, and structure.  This is NOT literally what I'm sending).
> > >
> > > SolrInputDocument {
> > >    id: '08464756-4ecd-4758-b8cc-9575d8a922ce',
> > >    archiveDate_dt: "2023-04-08T12:23:43Z",
> > >    _batchId: 251
> > >    _type: "document",
> > >    ....
> > >    *pages*: [
> > >       SolrInputDocument {
> > >          id: "1c482d15-6dd2-4bb2-8583-59231aa8db9b",
> > >          archiveDate_dt: "2023-04-08T12:23:43Z",
> > >          _batchId: 251
> > >          _pageNumber: 1,
> > >          _type: "page",
> > >          content: "lorem ipsum dolor...."
> > >       },
> > >       SolrInputDocument {
> > >          ...
> > >        },
> > >        ...
> > >    ]
> > > }
> > >
> > > It clearly doesn't like the field `pages` which isn't defined in the
> > > schema, but reading the various documents from Solr seems like it
> doesn't
> > > need to be?  That Solr should recognize the nested documents contained
> > > within the embedded List and know what to do.  Do I have that right?
> I'm
> > > piecing together information from several versions of Solr because the
> > > 6.6.6 docs are pretty sparse about how things work, and later versions
> > do a
> > > better job explaining it.
> > >
> > > What am I doing wrong?
> > >
> > > Thanks in advance
> > > Charlie
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: Issue indexing nested documents

Posted by Charlie Hubbard <ch...@gmail.com>.
Hi,

I'm not sure what you mean by "naming children" exactly.  I don't have a
stack trace, but this video discusses nested documents, and refers to
enhanced support in 6.x so I know 6.x supports nested documents.

https://youtu.be/qV0fIg-LGBE?t=446

Here is what is in the Solr 8 manual:

>
>    - Even though child documents are provided as field values
>    syntactically and with SolrJ, it’s a matter of syntax and it isn’t an
>    actual field in the schema. Consequently, the field need not be defined in
>    the schema and probably shouldn’t be as it would be confusing. There is no
>    child document field type, at least not yet.
>
> And in the Solr 6.6 manual:

> Nested documents may be indexed via either the XML or JSON data syntax (or
> using SolrJ)
> <https://solr.apache.org/guide/6_6/using-solrj.html#using-solrj>


However, no such example using SolrJ is included, leading me to look to
other versions that document the feature better.  Solr 6.6 manual does
refer to the special field in JSON indexing `_childDocuments_`.  It's not
until Solr 8.7 manual does it provide an example of using Solrj to index
nested child documents.

https://solr.apache.org/guide/8_7/indexing-nested-documents.html

The first example using SolrJ simply sets a field with a List of
SolrInputDocuments which is what I'm doing.  The 2nd uses a method on
SolrInputDocument.addChildDocuments(Collection<SolrInputDocument>).  I see
that method available on SolrJ for 6.x so I'm assuming that is supported.
I'm just curious if someone knows if the first method should be supported
or not?

Charlie

On Mon, Apr 10, 2023 at 3:36 PM Mikhail Khludnev <mk...@apache.org> wrote:

> Hello Charlie.
> My (just) guess is that the old version might not support naming children,
> unless you can prove it.
> If you share stacktrace with method names and line numbers it might give a
> clue regarding support for this functionality.
>
> On Mon, Apr 10, 2023 at 7:12 PM Charlie Hubbard <charlie.hubbard@gmail.com
> >
> wrote:
>
> > Hi
> >
> > I'm encountering the following error when indexing a parent and nested
> > children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj to
> > send the documents up to Solr for indexing.  I have the default `_root_`
> > field defined in the schema, and the parent document can be indexed
> without
> > issue.  It's when I include the field `pages` that Solr complains.
> >
> > 1040R.pdf was rejected by the server for Error from server at
> > http://localhost:8983/solr/mycollection: ERROR:
> > [doc=08464756-4ecd-4758-b8cc-9575d8a922ce] multiple values encountered
> for
> > non multiValued field pages: [SolrInputDocument(fields:
> >
> > Here is the structure I'm trying to upload (this is
> > psuedocode/JSON-ish representation to help represent data, their data
> > types, and structure.  This is NOT literally what I'm sending).
> >
> > SolrInputDocument {
> >    id: '08464756-4ecd-4758-b8cc-9575d8a922ce',
> >    archiveDate_dt: "2023-04-08T12:23:43Z",
> >    _batchId: 251
> >    _type: "document",
> >    ....
> >    *pages*: [
> >       SolrInputDocument {
> >          id: "1c482d15-6dd2-4bb2-8583-59231aa8db9b",
> >          archiveDate_dt: "2023-04-08T12:23:43Z",
> >          _batchId: 251
> >          _pageNumber: 1,
> >          _type: "page",
> >          content: "lorem ipsum dolor...."
> >       },
> >       SolrInputDocument {
> >          ...
> >        },
> >        ...
> >    ]
> > }
> >
> > It clearly doesn't like the field `pages` which isn't defined in the
> > schema, but reading the various documents from Solr seems like it doesn't
> > need to be?  That Solr should recognize the nested documents contained
> > within the embedded List and know what to do.  Do I have that right?  I'm
> > piecing together information from several versions of Solr because the
> > 6.6.6 docs are pretty sparse about how things work, and later versions
> do a
> > better job explaining it.
> >
> > What am I doing wrong?
> >
> > Thanks in advance
> > Charlie
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>

Re: Issue indexing nested documents

Posted by Mikhail Khludnev <mk...@apache.org>.
Hello Charlie.
My (just) guess is that the old version might not support naming children,
unless you can prove it.
If you share stacktrace with method names and line numbers it might give a
clue regarding support for this functionality.

On Mon, Apr 10, 2023 at 7:12 PM Charlie Hubbard <ch...@gmail.com>
wrote:

> Hi
>
> I'm encountering the following error when indexing a parent and nested
> children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj to
> send the documents up to Solr for indexing.  I have the default `_root_`
> field defined in the schema, and the parent document can be indexed without
> issue.  It's when I include the field `pages` that Solr complains.
>
> 1040R.pdf was rejected by the server for Error from server at
> http://localhost:8983/solr/mycollection: ERROR:
> [doc=08464756-4ecd-4758-b8cc-9575d8a922ce] multiple values encountered for
> non multiValued field pages: [SolrInputDocument(fields:
>
> Here is the structure I'm trying to upload (this is
> psuedocode/JSON-ish representation to help represent data, their data
> types, and structure.  This is NOT literally what I'm sending).
>
> SolrInputDocument {
>    id: '08464756-4ecd-4758-b8cc-9575d8a922ce',
>    archiveDate_dt: "2023-04-08T12:23:43Z",
>    _batchId: 251
>    _type: "document",
>    ....
>    *pages*: [
>       SolrInputDocument {
>          id: "1c482d15-6dd2-4bb2-8583-59231aa8db9b",
>          archiveDate_dt: "2023-04-08T12:23:43Z",
>          _batchId: 251
>          _pageNumber: 1,
>          _type: "page",
>          content: "lorem ipsum dolor...."
>       },
>       SolrInputDocument {
>          ...
>        },
>        ...
>    ]
> }
>
> It clearly doesn't like the field `pages` which isn't defined in the
> schema, but reading the various documents from Solr seems like it doesn't
> need to be?  That Solr should recognize the nested documents contained
> within the embedded List and know what to do.  Do I have that right?  I'm
> piecing together information from several versions of Solr because the
> 6.6.6 docs are pretty sparse about how things work, and later versions do a
> better job explaining it.
>
> What am I doing wrong?
>
> Thanks in advance
> Charlie
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!

Re: Issue indexing nested documents

Posted by dmitri maziuk <dm...@gmail.com>.
On 2023-04-10 12:47 PM, Charlie Hubbard wrote:
> So I did try to add `_nest_path_` but I can't define it because the class `
> solr.NestPathField`isn't available in Solr 6.6.6.  I also have another
> strange error message about atomic updates like so:
...

_root_ needs to be stored and/or docValues =true for atomic updates; it 
isn't if you start from the default config. No idea if that is the cause 
of your error though.

Dima


Re: Issue indexing nested documents

Posted by Charlie Hubbard <ch...@gmail.com>.
So I did try to add `_nest_path_` but I can't define it because the class `
solr.NestPathField`isn't available in Solr 6.6.6.  I also have another
strange error message about atomic updates like so:

some_document.pdf was rejected by the server for Error from server at
http://localhost:8983/solr/fusearchiver: RunUpdateProcessor has received an
AddUpdateCommand containing a document that appears to still contain Atomic
document update operations, most likely because
DistributedUpdateProcessorFactory was explicitly disabled from this
updateRequestProcessorChain

But I wasn't sure what this actually meant.  The structure of the document
was the same as what I posted in my first message, but for just a different
file.  I wasn't sure how I was inadvertently creating update
commands either.  But I was going to focus on the message I mostly
understood, and come back to this one later.

Charlie

On Mon, Apr 10, 2023 at 12:30 PM dmitri maziuk <dm...@gmail.com>
wrote:

> On 2023-04-10 11:12 AM, Charlie Hubbard wrote:
> > Hi
> >
> > I'm encountering the following error when indexing a parent and nested
> > children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj to
> > send the documents up to Solr for indexing.  I have the default `_root_`
> > field defined in the schema, and the parent document can be indexed
> without
> > issue.
>
> Do you have _nest_path_ defined? I also find it helps to have
> _nest_parent_ and "atomic updates" setup for _root_ and _nest_path_. But
> I have 8.11 and am not using SolrJ, so...
>
> Dima
>
>

Re: Issue indexing nested documents

Posted by dmitri maziuk <dm...@gmail.com>.
On 2023-04-10 11:12 AM, Charlie Hubbard wrote:
> Hi
> 
> I'm encountering the following error when indexing a parent and nested
> children documents.  I'm using managed schemas, Solr 6.6.6, and Solrj to
> send the documents up to Solr for indexing.  I have the default `_root_`
> field defined in the schema, and the parent document can be indexed without
> issue.

Do you have _nest_path_ defined? I also find it helps to have 
_nest_parent_ and "atomic updates" setup for _root_ and _nest_path_. But 
I have 8.11 and am not using SolrJ, so...

Dima