You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Bogdan (JIRA)" <ji...@apache.org> on 2014/11/04 10:39:34 UTC

[jira] [Updated] (SOLR-6700) ChildDocTransformer doesn't return correct children after updating and optimising sol'r index

     [ https://issues.apache.org/jira/browse/SOLR-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bogdan updated SOLR-6700:
-------------------------
    Description: 
I have an index with nested documents. 
{code:title=schema.xml snippet|borderStyle=solid}
 <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="entityType" type="int" indexed="true" stored="true" required="true"/>
<field name="pName" type="string" indexed="true" stored="true"/>
<field name="cAlbum" type="string" indexed="true" stored="true"/>
<field name="cSong" type="string" indexed="true" stored="true"/>
<field name="_root_" type="string" indexed="true" stored="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
{code}

Afterwards I add the following documents:
{code}
<add>
  <doc>
    <field name="id">1</field>
    <field name="pName">Test Artist 1</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">11</field>
        <field name="cAlbum">Test Album 1</field>
	    <field name="cSong">Test Song 1</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="pName">Test Artist 2</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">22</field>
        <field name="cAlbum">Test Album 2</field>
	    <field name="cSong">Test Song 2</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
</add>
{code}

After performing the following query 
{quote}
http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3DentityType%3A1%7D&fl=*%2Cscore%2C%5Bchild+parentFilter%3DentityType%3A1%5D&wt=json&indent=true
{quote}
I get a correct answer (child matches parent, check _root_ field)
{code:title=add docs|borderStyle=solid}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"1",
        "pName":"Test Artist 1",
        "entityType":1,
        "_version_":1483832661048819712,
        "_root_":"1",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"}]},
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]}]
  }}
{code}

Afterwards I try to update one document:
{code:title=update doc|borderStyle=solid}
<add>
<doc>
<field name="id">1</field>
<field name="pName" update="set">INIT</field>
</doc>
</add>
{code}

After performing the previous query I get the right result (like the previous one but with the pName field updated).

The problem only comes after performing an optimize. 
Now, the same query yields the following result:
{code}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"},
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]},
      {
        "id":"1",
        "pName":"INIT",
        "entityType":1,
        "_root_":"1",
        "_version_":1483832916867809280,
        "score":1.0}]
  }}
{code}

As can be seen, the document with id:2 now contains the child with id:11 that belongs to the document with id:1. 

I haven't found any references on the web about this except http://blog.griddynamics.com/2013/09/solr-block-join-support.html


  was:
I have an index with nested documents. 
{code:title=schema.xml snippet|borderStyle=solid}
 <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="entityType" type="int" indexed="true" stored="true" required="true"/>
<field name="pName" type="string" indexed="true" stored="true"/>
<field name="cAlbum" type="string" indexed="true" stored="true"/>
<field name="cSong" type="string" indexed="true" stored="true"/>
<field name="_root_" type="string" indexed="true" stored="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
{code}

Afterwards I add the following documents:
{code}
<add>
  <doc>
    <field name="id">1</field>
    <field name="pName">Test Artist 1</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">11</field>
        <field name="cAlbum">Test Album 1</field>
	    <field name="cSong">Test Song 1</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="pName">Test Artist 2</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">22</field>
        <field name="cAlbum">Test Album 2</field>
	    <field name="cSong">Test Song 2</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
</add>
{code}

After performing the following query 
{quote}
http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3DentityType%3A1%7D&fl=*%2Cscore%2C%5Bchild+parentFilter%3DentityType%3A1%5D&wt=json&indent=true
{quote}
I get a correct answer (child matches parent, check _root_ field)
{code:title=add docs|borderStyle=solid}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"1",
        "pName":"Test Artist 1",
        "entityType":1,
        "_version_":1483832661048819712,
        "_root_":"1",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"}]},
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]}]
  }}
{code}

Afterwards I try to update one document:
{code:title=update doc|borderStyle=solid}
<add>
<doc>
<field name="id">1</field>
<field name="pName" update="set">INIT</field>
</doc>
</add>
{code}

After performing the previous query I get the right result (like the previous one but with the pName field updated).

The problem only comes after performing an optimize. 
Now, the same query yields the following result:
{code}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"},
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]},
      {
        "id":"1",
        "pName":"INIT",
        "entityType":1,
        "_root_":"1",
        "_version_":1483832916867809280,
        "score":1.0}]
  }}
{code}

As can be seen, the document with id:2 now contains the child with id:11 that belongs to the document with id:1. 

I haven't found any references on the web about this except http://blog.griddynamics.com/2013/09/solr-block-join-support.html
{quote}
Let me show you one unlucky example. Let’s remove parent and left children in the index.
<update><delete><query>id:10</query></delete><commit/></update>  
At first, It seems like everything still works. Children 11 and 12 are left in the index, but ToParentBlockJoinQuery somehow detects it and q={!parent which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL  correctly returns parent 30. However after <optimize/> is executed, deleted parent document is purged from the index and all of the sudden children 11 and 12 start to be considered as if they belong to parent 20! The same query q={!parent which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL now returns 20 and 30 which is wrong! I’m afraid there are few other similar cases of wrong behavior. As a reliable workaround I suggest to send explicit deletes by query with implicit field _root_. I hope this caveat will be fixed in future.
{quote}


> ChildDocTransformer doesn't return correct children after updating and optimising sol'r index
> ---------------------------------------------------------------------------------------------
>
>                 Key: SOLR-6700
>                 URL: https://issues.apache.org/jira/browse/SOLR-6700
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Bogdan
>            Priority: Blocker
>             Fix For: 4.10.3, 5.0
>
>
> I have an index with nested documents. 
> {code:title=schema.xml snippet|borderStyle=solid}
>  <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
> <field name="entityType" type="int" indexed="true" stored="true" required="true"/>
> <field name="pName" type="string" indexed="true" stored="true"/>
> <field name="cAlbum" type="string" indexed="true" stored="true"/>
> <field name="cSong" type="string" indexed="true" stored="true"/>
> <field name="_root_" type="string" indexed="true" stored="true"/>
> <field name="_version_" type="long" indexed="true" stored="true"/>
> {code}
> Afterwards I add the following documents:
> {code}
> <add>
>   <doc>
>     <field name="id">1</field>
>     <field name="pName">Test Artist 1</field>
>     <field name="entityType">1</field>
>     <doc>
>         <field name="id">11</field>
>         <field name="cAlbum">Test Album 1</field>
> 	    <field name="cSong">Test Song 1</field>
>         <field name="entityType">2</field>
>     </doc>
>   </doc>
>   <doc>
>     <field name="id">2</field>
>     <field name="pName">Test Artist 2</field>
>     <field name="entityType">1</field>
>     <doc>
>         <field name="id">22</field>
>         <field name="cAlbum">Test Album 2</field>
> 	    <field name="cSong">Test Song 2</field>
>         <field name="entityType">2</field>
>     </doc>
>   </doc>
> </add>
> {code}
> After performing the following query 
> {quote}
> http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3DentityType%3A1%7D&fl=*%2Cscore%2C%5Bchild+parentFilter%3DentityType%3A1%5D&wt=json&indent=true
> {quote}
> I get a correct answer (child matches parent, check _root_ field)
> {code:title=add docs|borderStyle=solid}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":1,
>     "params":{
>       "fl":"*,score,[child parentFilter=entityType:1]",
>       "indent":"true",
>       "q":"{!parent which=entityType:1}",
>       "wt":"json"}},
>   "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
>       {
>         "id":"1",
>         "pName":"Test Artist 1",
>         "entityType":1,
>         "_version_":1483832661048819712,
>         "_root_":"1",
>         "score":1.0,
>         "_childDocuments_":[
>         {
>           "id":"11",
>           "cAlbum":"Test Album 1",
>           "cSong":"Test Song 1",
>           "entityType":2,
>           "_root_":"1"}]},
>       {
>         "id":"2",
>         "pName":"Test Artist 2",
>         "entityType":1,
>         "_version_":1483832661050916864,
>         "_root_":"2",
>         "score":1.0,
>         "_childDocuments_":[
>         {
>           "id":"22",
>           "cAlbum":"Test Album 2",
>           "cSong":"Test Song 2",
>           "entityType":2,
>           "_root_":"2"}]}]
>   }}
> {code}
> Afterwards I try to update one document:
> {code:title=update doc|borderStyle=solid}
> <add>
> <doc>
> <field name="id">1</field>
> <field name="pName" update="set">INIT</field>
> </doc>
> </add>
> {code}
> After performing the previous query I get the right result (like the previous one but with the pName field updated).
> The problem only comes after performing an optimize. 
> Now, the same query yields the following result:
> {code}
> {
>   "responseHeader":{
>     "status":0,
>     "QTime":1,
>     "params":{
>       "fl":"*,score,[child parentFilter=entityType:1]",
>       "indent":"true",
>       "q":"{!parent which=entityType:1}",
>       "wt":"json"}},
>   "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
>       {
>         "id":"2",
>         "pName":"Test Artist 2",
>         "entityType":1,
>         "_version_":1483832661050916864,
>         "_root_":"2",
>         "score":1.0,
>         "_childDocuments_":[
>         {
>           "id":"11",
>           "cAlbum":"Test Album 1",
>           "cSong":"Test Song 1",
>           "entityType":2,
>           "_root_":"1"},
>         {
>           "id":"22",
>           "cAlbum":"Test Album 2",
>           "cSong":"Test Song 2",
>           "entityType":2,
>           "_root_":"2"}]},
>       {
>         "id":"1",
>         "pName":"INIT",
>         "entityType":1,
>         "_root_":"1",
>         "_version_":1483832916867809280,
>         "score":1.0}]
>   }}
> {code}
> As can be seen, the document with id:2 now contains the child with id:11 that belongs to the document with id:1. 
> I haven't found any references on the web about this except http://blog.griddynamics.com/2013/09/solr-block-join-support.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org