You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Kramer <Da...@shoebuy.com> on 2017/02/01 21:29:11 UTC

Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”


Some background:
·         The data involved is catalog data, with three nested objects: Products, Items, and Skus, in that order. We have a docType field on each record as a differentiator.
·         The "id" field in our data is unique within datatype, but not across datatypes. We added a "uuid" field in our program that generates the Solr import file that is the id prefixed by the first letter of the docType, like P12345. That makes the uuid field unique, and we have that as the uniqueKey in our schema.xml.
·         We are trying to retrieve the parent Product, and all children documents. As such, we are using the ChildDocTransformerFactory ([child...]) to retrieve the children along with the parent. We have not yet solved the problem of getting items within SKUs as nested documents in the results, and we will have to figure that out at some point, but for now we get them flattened
·         We are building out the proof of concept for this. This is all new work, so we are free to change a lot.
·         This is Solr 6.0.0, and we are importing in JSON format, if that matters
·         I submitted this question to StackOverflow<http://stackoverflow.com/questions/41969353/solr-querying-nested-documents-with-childdoctransformerfactory-get-parent-quer> but haven’t gotten any answers yet.


Our data looks like this (I've removed some fields for simplicity):

{

  "id": 739063,

  "docType": "Product",

  "uuid": "P739063",

  "_childDocuments_": [

    {

      "id": 1537378,

      "price": 25.45,

      "color": "Blush",

      "docType": "Item",

      "productId": 739063,

      "uuid": "I1537378",

      "_childDocuments_": [

        {

          "id": 12799578,

          "size": "10",

          "width": "W",

          "docType": "Sku",

          "itemId": 1537378,

          "uuid": "S12799578"

        }

      ]

    }

}



The query to fetch all Products and their children nested inside them is q=docType:Product&fl=title,id,docType,[child parentFilter=docType:Product]. When I run that query, all is well, and it returns the first 10 rows. However, if I fetch more rows by adding, say &rows=500, we get the error Parent query yields document which is not matched by parents filter, docID=XXX.

When we first saw that error, we discovered our id field was not unique across document types, so we added the uuid field as mentioned above, which is. we also added in our schema.xml file, wiped the core, recreated it, and restarted Solr just to make sure it was in effect. We have double checked and are sure that the uuid fields are unique.



In all the search results for that error that I've found, the OP did not have a field that could differentiate the different document types, but as you see we do. Since both the query and the parentFilter are searching for docType:Product I don't see how either could possibly return anything but parents. We've also tried adding childFilter=docType:Item and childFilter=docType:Sku but that did not help.  I also tried using title:* for the filter since only products have titles.



Is there anything else we can try?

Any explanation of this?

Is it possible that it's not using uuid as the unique identifier even though it's specified in the schema.xml, and would that even cause this?

Thanks.



Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

Posted by David Kramer <Da...@shoebuy.com>.
For closure, I’ve solved the problem!  It was not using my schema.xml at all.  I had to change the solrconfig.xml to include <schemaFactory class="ClassicIndexSchemaFactory"/> and comment out the schema adding processor.

My schema still didn’t work right, but I took the managed-schema and renamed it and changed uniqueKey to uuid and everything worked!

Thanks for your time and help.


On 2/2/17, 4:35 PM, "David Kramer" <Da...@shoebuy.com> wrote:

    Yes, think of the starving orphan records…
    
    Ours is an eCommerce system, selling mostly shoes.  We have three levels of nested objects representing what we sell:
    - Product: Mostly title and description
    - Item: A specific color and some other attributes, including price. Products have 1 or more Items, Items belong to one product.
    - SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong to one Item.
    [PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]
    
    Products, items, and SKUs all have ID numbers. One product will never have the same ID as another product, but it’s possible for a product to have the same ID as an Item or a SKU. And that is the problem.  So the program that creates the import file adds a new field called uuid, that is a P, I, or S (for Product, Item, or SKU) followed by the ID.  We did it this way because my understanding is Solr can’t implement a compound unique key.  The uuid is unique across all documents, not just all documents of the same docType.
    
    So in the case of my unique test to see if it would complain if the UUID of a document I was inserting was not unique, I grabbed the first few products from the full import file, and changed the IDs so they are not duplicates of the real data, but left the UUIDs alone, so they are duplicates of the real data, which was already loaded.  
    
    My expectation was that when I loaded the data I would get some  error saying that UUID was already used.  YOUR expectation is that the record would be overwritten.  What actually happened is that the new documents got added with their duplicate UUIDs, which is the worst possible case.  This is why I think it’s not respecting my uniqueKey setting in schema.xml.
    
    Does that make more sense?  I hope you can help me understand this discrepancy. Thanks for your efforts so far.
    
    On 2/2/17, 3:13 PM, "Mikhail Khludnev" <mk...@apache.org> wrote:
    
        David,
        I hardly get the way which IDs are assigned, but beware that repeating
        uniqueKey
        value causes deleting former occurrence. In case of block join index it
        corrupts block structure: parent can't be deleted and left children orphans
        (.. so touching, I'm sorry). Just make sure that number of deleted docs is
        0 at first.
        
        On Thu, Feb 2, 2017 at 6:20 PM, David Kramer <Da...@shoebuy.com>
        wrote:
        
        > Thanks, for responding. Mikhail.  There are no deleted documents.  Since
        > I’m fairly new to Solr, one of the things I’ve been paranoid about is I
        > have no way of validating my schema.xml, or know whether Solr is even using
        > it (I have evidence it’s not, more below). So for each test, I’ve wiped out
        > the index, recreated, and reimported.
        >
        > Back to whether my schema.xml is being used, I mentioned that I had to
        > come up with a compound UUID field of the first character of the docType
        > plus the ID, and we put “<uniqueKey>uuid</uniqueKey>” (was id) in our
        > schema.xml.  Then I deleted and recreated the index and restarted Solr.  In
        > order to verify it was working, I created an import file that had unique
        > IDs but UUIDs which were duplicates of existing records, and it imported
        > the new records even though the UUIDs existed in the database already.  I’m
        > not sure if Solr should have produced an error or not. I’ll research that,
        > but I mention that here in case it’s relevant.
        >
        > Thanks.
        >
        > On 2/2/17, 6:10 AM, "Mikhail Khludnev" <mk...@apache.org> wrote:
        >
        >     David,
        >
        >     Can you make sure your index doesn't have deleted docs? This  can be
        > seen
        >     in SolrAdmiun.
        >     And can you merge index to avoid having them in the index?
        >
        >     On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
        > David.Kramer@shoebuy.com>
        >     wrote:
        >
        >     >
        >     >
        >     > Some background:
        >     > ·         The data involved is catalog data, with three nested
        > objects:
        >     > Products, Items, and Skus, in that order. We have a docType field on
        > each
        >     > record as a differentiator.
        >     > ·         The "id" field in our data is unique within datatype, but
        > not
        >     > across datatypes. We added a "uuid" field in our program that
        > generates the
        >     > Solr import file that is the id prefixed by the first letter of the
        >     > docType, like P12345. That makes the uuid field unique, and we have
        > that as
        >     > the uniqueKey in our schema.xml.
        >     > ·         We are trying to retrieve the parent Product, and all
        > children
        >     > documents. As such, we are using the ChildDocTransformerFactory
        >     > ([child...]) to retrieve the children along with the parent. We have
        > not
        >     > yet solved the problem of getting items within SKUs as nested
        > documents in
        >     > the results, and we will have to figure that out at some point, but
        > for now
        >     > we get them flattened
        >     > ·         We are building out the proof of concept for this. This is
        > all
        >     > new work, so we are free to change a lot.
        >     > ·         This is Solr 6.0.0, and we are importing in JSON format,
        > if that
        >     > matters
        >     > ·         I submitted this question to StackOverflow<http://
        >     > stackoverflow.com/questions/41969353/solr-querying-nested-
        > documents-with-
        >     > childdoctransformerfactory-get-parent-quer> but haven’t gotten any
        >     > answers yet.
        >     >
        >     >
        >     > Our data looks like this (I've removed some fields for simplicity):
        >     >
        >     > {
        >     >
        >     >   "id": 739063,
        >     >
        >     >   "docType": "Product",
        >     >
        >     >   "uuid": "P739063",
        >     >
        >     >   "_childDocuments_": [
        >     >
        >     >     {
        >     >
        >     >       "id": 1537378,
        >     >
        >     >       "price": 25.45,
        >     >
        >     >       "color": "Blush",
        >     >
        >     >       "docType": "Item",
        >     >
        >     >       "productId": 739063,
        >     >
        >     >       "uuid": "I1537378",
        >     >
        >     >       "_childDocuments_": [
        >     >
        >     >         {
        >     >
        >     >           "id": 12799578,
        >     >
        >     >           "size": "10",
        >     >
        >     >           "width": "W",
        >     >
        >     >           "docType": "Sku",
        >     >
        >     >           "itemId": 1537378,
        >     >
        >     >           "uuid": "S12799578"
        >     >
        >     >         }
        >     >
        >     >       ]
        >     >
        >     >     }
        >     >
        >     > }
        >     >
        >     >
        >     >
        >     > The query to fetch all Products and their children nested inside
        > them is
        >     > q=docType:Product&fl=title,id,docType,[child
        >     > parentFilter=docType:Product]. When I run that query, all is well,
        > and it
        >     > returns the first 10 rows. However, if I fetch more rows by adding,
        > say
        >     > &rows=500, we get the error Parent query yields document which is not
        >     > matched by parents filter, docID=XXX.
        >     >
        >     > When we first saw that error, we discovered our id field was not
        > unique
        >     > across document types, so we added the uuid field as mentioned
        > above, which
        >     > is. we also added in our schema.xml file, wiped the core, recreated
        > it, and
        >     > restarted Solr just to make sure it was in effect. We have double
        > checked
        >     > and are sure that the uuid fields are unique.
        >     >
        >     >
        >     >
        >     > In all the search results for that error that I've found, the OP did
        > not
        >     > have a field that could differentiate the different document types,
        > but as
        >     > you see we do. Since both the query and the parentFilter are
        > searching for
        >     > docType:Product I don't see how either could possibly return
        > anything but
        >     > parents. We've also tried adding childFilter=docType:Item and
        >     > childFilter=docType:Sku but that did not help.  I also tried using
        > title:*
        >     > for the filter since only products have titles.
        >     >
        >     >
        >     >
        >     > Is there anything else we can try?
        >     >
        >     > Any explanation of this?
        >     >
        >     > Is it possible that it's not using uuid as the unique identifier even
        >     > though it's specified in the schema.xml, and would that even cause
        > this?
        >     >
        >     > Thanks.
        >     >
        >     >
        >     >
        >
        >
        >     --
        >     Sincerely yours
        >     Mikhail Khludnev
        >
        >
        >
        
        
        -- 
        Sincerely yours
        Mikhail Khludnev
        
    
    


Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

Posted by David Kramer <Da...@shoebuy.com>.
Yes, think of the starving orphan records…

Ours is an eCommerce system, selling mostly shoes.  We have three levels of nested objects representing what we sell:
- Product: Mostly title and description
- Item: A specific color and some other attributes, including price. Products have 1 or more Items, Items belong to one product.
- SKU: A specific size and SKU ID. Items have 1 or more SKUs, SKUs belong to one Item.
[PRODUCT  [ITEM  [SKU] [SKU] [SKU]] [ITEM [SKU]] ]

Products, items, and SKUs all have ID numbers. One product will never have the same ID as another product, but it’s possible for a product to have the same ID as an Item or a SKU. And that is the problem.  So the program that creates the import file adds a new field called uuid, that is a P, I, or S (for Product, Item, or SKU) followed by the ID.  We did it this way because my understanding is Solr can’t implement a compound unique key.  The uuid is unique across all documents, not just all documents of the same docType.

So in the case of my unique test to see if it would complain if the UUID of a document I was inserting was not unique, I grabbed the first few products from the full import file, and changed the IDs so they are not duplicates of the real data, but left the UUIDs alone, so they are duplicates of the real data, which was already loaded.  

My expectation was that when I loaded the data I would get some  error saying that UUID was already used.  YOUR expectation is that the record would be overwritten.  What actually happened is that the new documents got added with their duplicate UUIDs, which is the worst possible case.  This is why I think it’s not respecting my uniqueKey setting in schema.xml.

Does that make more sense?  I hope you can help me understand this discrepancy. Thanks for your efforts so far.

On 2/2/17, 3:13 PM, "Mikhail Khludnev" <mk...@apache.org> wrote:

    David,
    I hardly get the way which IDs are assigned, but beware that repeating
    uniqueKey
    value causes deleting former occurrence. In case of block join index it
    corrupts block structure: parent can't be deleted and left children orphans
    (.. so touching, I'm sorry). Just make sure that number of deleted docs is
    0 at first.
    
    On Thu, Feb 2, 2017 at 6:20 PM, David Kramer <Da...@shoebuy.com>
    wrote:
    
    > Thanks, for responding. Mikhail.  There are no deleted documents.  Since
    > I’m fairly new to Solr, one of the things I’ve been paranoid about is I
    > have no way of validating my schema.xml, or know whether Solr is even using
    > it (I have evidence it’s not, more below). So for each test, I’ve wiped out
    > the index, recreated, and reimported.
    >
    > Back to whether my schema.xml is being used, I mentioned that I had to
    > come up with a compound UUID field of the first character of the docType
    > plus the ID, and we put “<uniqueKey>uuid</uniqueKey>” (was id) in our
    > schema.xml.  Then I deleted and recreated the index and restarted Solr.  In
    > order to verify it was working, I created an import file that had unique
    > IDs but UUIDs which were duplicates of existing records, and it imported
    > the new records even though the UUIDs existed in the database already.  I’m
    > not sure if Solr should have produced an error or not. I’ll research that,
    > but I mention that here in case it’s relevant.
    >
    > Thanks.
    >
    > On 2/2/17, 6:10 AM, "Mikhail Khludnev" <mk...@apache.org> wrote:
    >
    >     David,
    >
    >     Can you make sure your index doesn't have deleted docs? This  can be
    > seen
    >     in SolrAdmiun.
    >     And can you merge index to avoid having them in the index?
    >
    >     On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
    > David.Kramer@shoebuy.com>
    >     wrote:
    >
    >     >
    >     >
    >     > Some background:
    >     > ·         The data involved is catalog data, with three nested
    > objects:
    >     > Products, Items, and Skus, in that order. We have a docType field on
    > each
    >     > record as a differentiator.
    >     > ·         The "id" field in our data is unique within datatype, but
    > not
    >     > across datatypes. We added a "uuid" field in our program that
    > generates the
    >     > Solr import file that is the id prefixed by the first letter of the
    >     > docType, like P12345. That makes the uuid field unique, and we have
    > that as
    >     > the uniqueKey in our schema.xml.
    >     > ·         We are trying to retrieve the parent Product, and all
    > children
    >     > documents. As such, we are using the ChildDocTransformerFactory
    >     > ([child...]) to retrieve the children along with the parent. We have
    > not
    >     > yet solved the problem of getting items within SKUs as nested
    > documents in
    >     > the results, and we will have to figure that out at some point, but
    > for now
    >     > we get them flattened
    >     > ·         We are building out the proof of concept for this. This is
    > all
    >     > new work, so we are free to change a lot.
    >     > ·         This is Solr 6.0.0, and we are importing in JSON format,
    > if that
    >     > matters
    >     > ·         I submitted this question to StackOverflow<http://
    >     > stackoverflow.com/questions/41969353/solr-querying-nested-
    > documents-with-
    >     > childdoctransformerfactory-get-parent-quer> but haven’t gotten any
    >     > answers yet.
    >     >
    >     >
    >     > Our data looks like this (I've removed some fields for simplicity):
    >     >
    >     > {
    >     >
    >     >   "id": 739063,
    >     >
    >     >   "docType": "Product",
    >     >
    >     >   "uuid": "P739063",
    >     >
    >     >   "_childDocuments_": [
    >     >
    >     >     {
    >     >
    >     >       "id": 1537378,
    >     >
    >     >       "price": 25.45,
    >     >
    >     >       "color": "Blush",
    >     >
    >     >       "docType": "Item",
    >     >
    >     >       "productId": 739063,
    >     >
    >     >       "uuid": "I1537378",
    >     >
    >     >       "_childDocuments_": [
    >     >
    >     >         {
    >     >
    >     >           "id": 12799578,
    >     >
    >     >           "size": "10",
    >     >
    >     >           "width": "W",
    >     >
    >     >           "docType": "Sku",
    >     >
    >     >           "itemId": 1537378,
    >     >
    >     >           "uuid": "S12799578"
    >     >
    >     >         }
    >     >
    >     >       ]
    >     >
    >     >     }
    >     >
    >     > }
    >     >
    >     >
    >     >
    >     > The query to fetch all Products and their children nested inside
    > them is
    >     > q=docType:Product&fl=title,id,docType,[child
    >     > parentFilter=docType:Product]. When I run that query, all is well,
    > and it
    >     > returns the first 10 rows. However, if I fetch more rows by adding,
    > say
    >     > &rows=500, we get the error Parent query yields document which is not
    >     > matched by parents filter, docID=XXX.
    >     >
    >     > When we first saw that error, we discovered our id field was not
    > unique
    >     > across document types, so we added the uuid field as mentioned
    > above, which
    >     > is. we also added in our schema.xml file, wiped the core, recreated
    > it, and
    >     > restarted Solr just to make sure it was in effect. We have double
    > checked
    >     > and are sure that the uuid fields are unique.
    >     >
    >     >
    >     >
    >     > In all the search results for that error that I've found, the OP did
    > not
    >     > have a field that could differentiate the different document types,
    > but as
    >     > you see we do. Since both the query and the parentFilter are
    > searching for
    >     > docType:Product I don't see how either could possibly return
    > anything but
    >     > parents. We've also tried adding childFilter=docType:Item and
    >     > childFilter=docType:Sku but that did not help.  I also tried using
    > title:*
    >     > for the filter since only products have titles.
    >     >
    >     >
    >     >
    >     > Is there anything else we can try?
    >     >
    >     > Any explanation of this?
    >     >
    >     > Is it possible that it's not using uuid as the unique identifier even
    >     > though it's specified in the schema.xml, and would that even cause
    > this?
    >     >
    >     > Thanks.
    >     >
    >     >
    >     >
    >
    >
    >     --
    >     Sincerely yours
    >     Mikhail Khludnev
    >
    >
    >
    
    
    -- 
    Sincerely yours
    Mikhail Khludnev
    


Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

Posted by Mikhail Khludnev <mk...@apache.org>.
David,
I hardly get the way which IDs are assigned, but beware that repeating
uniqueKey
value causes deleting former occurrence. In case of block join index it
corrupts block structure: parent can't be deleted and left children orphans
(.. so touching, I'm sorry). Just make sure that number of deleted docs is
0 at first.

On Thu, Feb 2, 2017 at 6:20 PM, David Kramer <Da...@shoebuy.com>
wrote:

> Thanks, for responding. Mikhail.  There are no deleted documents.  Since
> I’m fairly new to Solr, one of the things I’ve been paranoid about is I
> have no way of validating my schema.xml, or know whether Solr is even using
> it (I have evidence it’s not, more below). So for each test, I’ve wiped out
> the index, recreated, and reimported.
>
> Back to whether my schema.xml is being used, I mentioned that I had to
> come up with a compound UUID field of the first character of the docType
> plus the ID, and we put “<uniqueKey>uuid</uniqueKey>” (was id) in our
> schema.xml.  Then I deleted and recreated the index and restarted Solr.  In
> order to verify it was working, I created an import file that had unique
> IDs but UUIDs which were duplicates of existing records, and it imported
> the new records even though the UUIDs existed in the database already.  I’m
> not sure if Solr should have produced an error or not. I’ll research that,
> but I mention that here in case it’s relevant.
>
> Thanks.
>
> On 2/2/17, 6:10 AM, "Mikhail Khludnev" <mk...@apache.org> wrote:
>
>     David,
>
>     Can you make sure your index doesn't have deleted docs? This  can be
> seen
>     in SolrAdmiun.
>     And can you merge index to avoid having them in the index?
>
>     On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <
> David.Kramer@shoebuy.com>
>     wrote:
>
>     >
>     >
>     > Some background:
>     > ·         The data involved is catalog data, with three nested
> objects:
>     > Products, Items, and Skus, in that order. We have a docType field on
> each
>     > record as a differentiator.
>     > ·         The "id" field in our data is unique within datatype, but
> not
>     > across datatypes. We added a "uuid" field in our program that
> generates the
>     > Solr import file that is the id prefixed by the first letter of the
>     > docType, like P12345. That makes the uuid field unique, and we have
> that as
>     > the uniqueKey in our schema.xml.
>     > ·         We are trying to retrieve the parent Product, and all
> children
>     > documents. As such, we are using the ChildDocTransformerFactory
>     > ([child...]) to retrieve the children along with the parent. We have
> not
>     > yet solved the problem of getting items within SKUs as nested
> documents in
>     > the results, and we will have to figure that out at some point, but
> for now
>     > we get them flattened
>     > ·         We are building out the proof of concept for this. This is
> all
>     > new work, so we are free to change a lot.
>     > ·         This is Solr 6.0.0, and we are importing in JSON format,
> if that
>     > matters
>     > ·         I submitted this question to StackOverflow<http://
>     > stackoverflow.com/questions/41969353/solr-querying-nested-
> documents-with-
>     > childdoctransformerfactory-get-parent-quer> but haven’t gotten any
>     > answers yet.
>     >
>     >
>     > Our data looks like this (I've removed some fields for simplicity):
>     >
>     > {
>     >
>     >   "id": 739063,
>     >
>     >   "docType": "Product",
>     >
>     >   "uuid": "P739063",
>     >
>     >   "_childDocuments_": [
>     >
>     >     {
>     >
>     >       "id": 1537378,
>     >
>     >       "price": 25.45,
>     >
>     >       "color": "Blush",
>     >
>     >       "docType": "Item",
>     >
>     >       "productId": 739063,
>     >
>     >       "uuid": "I1537378",
>     >
>     >       "_childDocuments_": [
>     >
>     >         {
>     >
>     >           "id": 12799578,
>     >
>     >           "size": "10",
>     >
>     >           "width": "W",
>     >
>     >           "docType": "Sku",
>     >
>     >           "itemId": 1537378,
>     >
>     >           "uuid": "S12799578"
>     >
>     >         }
>     >
>     >       ]
>     >
>     >     }
>     >
>     > }
>     >
>     >
>     >
>     > The query to fetch all Products and their children nested inside
> them is
>     > q=docType:Product&fl=title,id,docType,[child
>     > parentFilter=docType:Product]. When I run that query, all is well,
> and it
>     > returns the first 10 rows. However, if I fetch more rows by adding,
> say
>     > &rows=500, we get the error Parent query yields document which is not
>     > matched by parents filter, docID=XXX.
>     >
>     > When we first saw that error, we discovered our id field was not
> unique
>     > across document types, so we added the uuid field as mentioned
> above, which
>     > is. we also added in our schema.xml file, wiped the core, recreated
> it, and
>     > restarted Solr just to make sure it was in effect. We have double
> checked
>     > and are sure that the uuid fields are unique.
>     >
>     >
>     >
>     > In all the search results for that error that I've found, the OP did
> not
>     > have a field that could differentiate the different document types,
> but as
>     > you see we do. Since both the query and the parentFilter are
> searching for
>     > docType:Product I don't see how either could possibly return
> anything but
>     > parents. We've also tried adding childFilter=docType:Item and
>     > childFilter=docType:Sku but that did not help.  I also tried using
> title:*
>     > for the filter since only products have titles.
>     >
>     >
>     >
>     > Is there anything else we can try?
>     >
>     > Any explanation of this?
>     >
>     > Is it possible that it's not using uuid as the unique identifier even
>     > though it's specified in the schema.xml, and would that even cause
> this?
>     >
>     > Thanks.
>     >
>     >
>     >
>
>
>     --
>     Sincerely yours
>     Mikhail Khludnev
>
>
>


-- 
Sincerely yours
Mikhail Khludnev

Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

Posted by David Kramer <Da...@shoebuy.com>.
Thanks, for responding. Mikhail.  There are no deleted documents.  Since I’m fairly new to Solr, one of the things I’ve been paranoid about is I have no way of validating my schema.xml, or know whether Solr is even using it (I have evidence it’s not, more below). So for each test, I’ve wiped out the index, recreated, and reimported. 

Back to whether my schema.xml is being used, I mentioned that I had to come up with a compound UUID field of the first character of the docType plus the ID, and we put “<uniqueKey>uuid</uniqueKey>” (was id) in our schema.xml.  Then I deleted and recreated the index and restarted Solr.  In order to verify it was working, I created an import file that had unique IDs but UUIDs which were duplicates of existing records, and it imported the new records even though the UUIDs existed in the database already.  I’m not sure if Solr should have produced an error or not. I’ll research that, but I mention that here in case it’s relevant.

Thanks.

On 2/2/17, 6:10 AM, "Mikhail Khludnev" <mk...@apache.org> wrote:

    David,
    
    Can you make sure your index doesn't have deleted docs? This  can be seen
    in SolrAdmiun.
    And can you merge index to avoid having them in the index?
    
    On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <Da...@shoebuy.com>
    wrote:
    
    >
    >
    > Some background:
    > ·         The data involved is catalog data, with three nested objects:
    > Products, Items, and Skus, in that order. We have a docType field on each
    > record as a differentiator.
    > ·         The "id" field in our data is unique within datatype, but not
    > across datatypes. We added a "uuid" field in our program that generates the
    > Solr import file that is the id prefixed by the first letter of the
    > docType, like P12345. That makes the uuid field unique, and we have that as
    > the uniqueKey in our schema.xml.
    > ·         We are trying to retrieve the parent Product, and all children
    > documents. As such, we are using the ChildDocTransformerFactory
    > ([child...]) to retrieve the children along with the parent. We have not
    > yet solved the problem of getting items within SKUs as nested documents in
    > the results, and we will have to figure that out at some point, but for now
    > we get them flattened
    > ·         We are building out the proof of concept for this. This is all
    > new work, so we are free to change a lot.
    > ·         This is Solr 6.0.0, and we are importing in JSON format, if that
    > matters
    > ·         I submitted this question to StackOverflow<http://
    > stackoverflow.com/questions/41969353/solr-querying-nested-documents-with-
    > childdoctransformerfactory-get-parent-quer> but haven’t gotten any
    > answers yet.
    >
    >
    > Our data looks like this (I've removed some fields for simplicity):
    >
    > {
    >
    >   "id": 739063,
    >
    >   "docType": "Product",
    >
    >   "uuid": "P739063",
    >
    >   "_childDocuments_": [
    >
    >     {
    >
    >       "id": 1537378,
    >
    >       "price": 25.45,
    >
    >       "color": "Blush",
    >
    >       "docType": "Item",
    >
    >       "productId": 739063,
    >
    >       "uuid": "I1537378",
    >
    >       "_childDocuments_": [
    >
    >         {
    >
    >           "id": 12799578,
    >
    >           "size": "10",
    >
    >           "width": "W",
    >
    >           "docType": "Sku",
    >
    >           "itemId": 1537378,
    >
    >           "uuid": "S12799578"
    >
    >         }
    >
    >       ]
    >
    >     }
    >
    > }
    >
    >
    >
    > The query to fetch all Products and their children nested inside them is
    > q=docType:Product&fl=title,id,docType,[child
    > parentFilter=docType:Product]. When I run that query, all is well, and it
    > returns the first 10 rows. However, if I fetch more rows by adding, say
    > &rows=500, we get the error Parent query yields document which is not
    > matched by parents filter, docID=XXX.
    >
    > When we first saw that error, we discovered our id field was not unique
    > across document types, so we added the uuid field as mentioned above, which
    > is. we also added in our schema.xml file, wiped the core, recreated it, and
    > restarted Solr just to make sure it was in effect. We have double checked
    > and are sure that the uuid fields are unique.
    >
    >
    >
    > In all the search results for that error that I've found, the OP did not
    > have a field that could differentiate the different document types, but as
    > you see we do. Since both the query and the parentFilter are searching for
    > docType:Product I don't see how either could possibly return anything but
    > parents. We've also tried adding childFilter=docType:Item and
    > childFilter=docType:Sku but that did not help.  I also tried using title:*
    > for the filter since only products have titles.
    >
    >
    >
    > Is there anything else we can try?
    >
    > Any explanation of this?
    >
    > Is it possible that it's not using uuid as the unique identifier even
    > though it's specified in the schema.xml, and would that even cause this?
    >
    > Thanks.
    >
    >
    >
    
    
    -- 
    Sincerely yours
    Mikhail Khludnev
    


Re: Solr querying nested documents with ChildDocTransformerFactory, get “Parent query yields document which is not matched by parents filter”

Posted by Mikhail Khludnev <mk...@apache.org>.
David,

Can you make sure your index doesn't have deleted docs? This  can be seen
in SolrAdmiun.
And can you merge index to avoid having them in the index?

On Thu, Feb 2, 2017 at 12:29 AM, David Kramer <Da...@shoebuy.com>
wrote:

>
>
> Some background:
> ·         The data involved is catalog data, with three nested objects:
> Products, Items, and Skus, in that order. We have a docType field on each
> record as a differentiator.
> ·         The "id" field in our data is unique within datatype, but not
> across datatypes. We added a "uuid" field in our program that generates the
> Solr import file that is the id prefixed by the first letter of the
> docType, like P12345. That makes the uuid field unique, and we have that as
> the uniqueKey in our schema.xml.
> ·         We are trying to retrieve the parent Product, and all children
> documents. As such, we are using the ChildDocTransformerFactory
> ([child...]) to retrieve the children along with the parent. We have not
> yet solved the problem of getting items within SKUs as nested documents in
> the results, and we will have to figure that out at some point, but for now
> we get them flattened
> ·         We are building out the proof of concept for this. This is all
> new work, so we are free to change a lot.
> ·         This is Solr 6.0.0, and we are importing in JSON format, if that
> matters
> ·         I submitted this question to StackOverflow<http://
> stackoverflow.com/questions/41969353/solr-querying-nested-documents-with-
> childdoctransformerfactory-get-parent-quer> but haven’t gotten any
> answers yet.
>
>
> Our data looks like this (I've removed some fields for simplicity):
>
> {
>
>   "id": 739063,
>
>   "docType": "Product",
>
>   "uuid": "P739063",
>
>   "_childDocuments_": [
>
>     {
>
>       "id": 1537378,
>
>       "price": 25.45,
>
>       "color": "Blush",
>
>       "docType": "Item",
>
>       "productId": 739063,
>
>       "uuid": "I1537378",
>
>       "_childDocuments_": [
>
>         {
>
>           "id": 12799578,
>
>           "size": "10",
>
>           "width": "W",
>
>           "docType": "Sku",
>
>           "itemId": 1537378,
>
>           "uuid": "S12799578"
>
>         }
>
>       ]
>
>     }
>
> }
>
>
>
> The query to fetch all Products and their children nested inside them is
> q=docType:Product&fl=title,id,docType,[child
> parentFilter=docType:Product]. When I run that query, all is well, and it
> returns the first 10 rows. However, if I fetch more rows by adding, say
> &rows=500, we get the error Parent query yields document which is not
> matched by parents filter, docID=XXX.
>
> When we first saw that error, we discovered our id field was not unique
> across document types, so we added the uuid field as mentioned above, which
> is. we also added in our schema.xml file, wiped the core, recreated it, and
> restarted Solr just to make sure it was in effect. We have double checked
> and are sure that the uuid fields are unique.
>
>
>
> In all the search results for that error that I've found, the OP did not
> have a field that could differentiate the different document types, but as
> you see we do. Since both the query and the parentFilter are searching for
> docType:Product I don't see how either could possibly return anything but
> parents. We've also tried adding childFilter=docType:Item and
> childFilter=docType:Sku but that did not help.  I also tried using title:*
> for the filter since only products have titles.
>
>
>
> Is there anything else we can try?
>
> Any explanation of this?
>
> Is it possible that it's not using uuid as the unique identifier even
> though it's specified in the schema.xml, and would that even cause this?
>
> Thanks.
>
>
>


-- 
Sincerely yours
Mikhail Khludnev