You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ali Nazemian <al...@gmail.com> on 2014/10/06 09:40:55 UTC

duplicate unique key after partial update in solr 4.10

Dear all,
Hi,
I am going to do partial update on a field that has not any value. Suppose
I have a document with document id (unique key) '12345' and field
"read_flag" which does not index at the first place. So the read_flag field
for this document has not any value. After I did partial update to this
document to set "read_flag"="true", I faced strange problem. Next time I
indexed same document with same values I saw two different version of
document with id '12345' in solr. One of them with read_flag=true and
another one without read_flag field! I dont want to have duplicate
documents (as it should not to be because of unique_key id). Would you
please tell me what caused such problem?
Best regards.

-- 
A.Nazemian

Re: duplicate unique key after partial update in solr 4.10

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
It seems like by-design
https://issues.apache.org/jira/browse/SOLR-5211
you can't update a parent doc from the block.

On Tue, Oct 7, 2014 at 9:44 AM, Ali Nazemian <al...@gmail.com> wrote:

> The list of docs before do partial update:
> <doc>
>             <field name="id">product01</field>
>             <field name="name">car</field>
>             <field name="content_type">product</field>
>             <doc>
>                 <field name="id">part01</field>
>                 <field name="name">wheels</field>
>                 <field name="content_type">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part02</field>
>                 <field name="name">engine</field>
>                 <field name="doctype">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part03</field>
>                 <field name="name">brakes</field>
>                 <field name="content_type">part</field>
>             </doc>
>         </doc>
>         <doc>
>             <field name="id">product02</field>
>             <field name="name">truck</field>
>             <field name="content_type">product</field>
>             <doc>
>                 <field name="id">part04</field>
>                 <field name="name">wheels</field>
>                 <field name="content_type">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part05</field>
>                 <field name="name">flaps</field>
>                 <field name="doctype">part</field>
>             </doc>
>         </doc>
>
> The list of docs after doing partial update of field read_flag for document
> "product01":
> <doc>
>             <field name="id">product01</field>
>             <field name="name">car</field>
>             <field name="content_type">product</field>
>             <field name="read_flag">true</field>
>             <doc>
>                 <field name="id">part01</field>
>                 <field name="name">wheels</field>
>                 <field name="content_type">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part02</field>
>                 <field name="name">engine</field>
>                 <field name="doctype">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part03</field>
>                 <field name="name">brakes</field>
>                 <field name="content_type">part</field>
>             </doc>
>         </doc>
>         <doc>
>             <field name="id">product02</field>
>             <field name="name">truck</field>
>             <field name="content_type">product</field>
>             <doc>
>                 <field name="id">part04</field>
>                 <field name="name">wheels</field>
>                 <field name="content_type">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part05</field>
>                 <field name="name">flaps</field>
>                 <field name="doctype">part</field>
>             </doc>
>         </doc>
>
> The list of documents after sending same documents again. (it should
> overwrite on the last one because of duplicate IDs)
>        <doc>
>             <field name="id">product01</field>
>             <field name="name">car</field>
>             <field name="content_type">product</field>
>             <field name="read_flag">true</field>
>       </doc>
>       <doc>
>             <field name="id">product01</field>
>             <field name="name">car</field>
>             <field name="content_type">product</field>
>             <doc>
>                 <field name="id">part01</field>
>                 <field name="name">wheels</field>
>                 <field name="content_type">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part02</field>
>                 <field name="name">engine</field>
>                 <field name="doctype">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part03</field>
>                 <field name="name">brakes</field>
>                 <field name="content_type">part</field>
>             </doc>
>         </doc>
>         <doc>
>             <field name="id">product02</field>
>             <field name="name">truck</field>
>             <field name="content_type">product</field>
>             <doc>
>                 <field name="id">part04</field>
>                 <field name="name">wheels</field>
>                 <field name="content_type">part</field>
>             </doc>
>             <doc>
>                 <field name="id">part05</field>
>                 <field name="name">flaps</field>
>                 <field name="doctype">part</field>
>             </doc>
>         </doc>
>
> But as you can see there are two different version of documents with the
> same ID (which is product01).
>
> Regards.
>
> On Mon, Oct 6, 2014 at 8:18 PM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
> > Can you upload the update documents then (into a Gist or similar).
> > Just so that people didn't have to re-imagine exact steps. Because, if
> > it fully checks out, it might be a bug and the next step would be
> > creating a JIRA ticket.
> >
> > Regards,
> >    Alex.
> > Personal: http://www.outerthoughts.com/ and @arafalov
> > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On 6 October 2014 11:23, Ali Nazemian <al...@gmail.com> wrote:
> > > Dear Alex,
> > > Hi,
> > > LOL, yeah I am sure. You can test it yourself. I did that on default
> > schema
> > > too. The results are same!
> > > Regards.
> > >
> > > On Mon, Oct 6, 2014 at 4:20 PM, Alexandre Rafalovitch <
> > arafalov@gmail.com>
> > > wrote:
> > >
> > >> A stupid question: Are you sure that what schema thinks your uniqueId
> > >> is - is the uniqueId in your setup? Also, that you are not somehow
> > >> using the flags to tell Solr to ignore duplicates?
> > >>
> > >> Regards,
> > >>    Alex.
> > >> Personal: http://www.outerthoughts.com/ and @arafalov
> > >> Solr resources and newsletter: http://www.solr-start.com/ and
> > @solrstart
> > >> Solr popularizers community:
> > https://www.linkedin.com/groups?gid=6713853
> > >>
> > >>
> > >> On 6 October 2014 03:40, Ali Nazemian <al...@gmail.com> wrote:
> > >> > Dear all,
> > >> > Hi,
> > >> > I am going to do partial update on a field that has not any value.
> > >> Suppose
> > >> > I have a document with document id (unique key) '12345' and field
> > >> > "read_flag" which does not index at the first place. So the
> read_flag
> > >> field
> > >> > for this document has not any value. After I did partial update to
> > this
> > >> > document to set "read_flag"="true", I faced strange problem. Next
> > time I
> > >> > indexed same document with same values I saw two different version
> of
> > >> > document with id '12345' in solr. One of them with read_flag=true
> and
> > >> > another one without read_flag field! I dont want to have duplicate
> > >> > documents (as it should not to be because of unique_key id). Would
> you
> > >> > please tell me what caused such problem?
> > >> > Best regards.
> > >> >
> > >> > --
> > >> > A.Nazemian
> > >>
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> >
>
>
>
> --
> A.Nazemian
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>

Re: duplicate unique key after partial update in solr 4.10

Posted by Ali Nazemian <al...@gmail.com>.
The list of docs before do partial update:
<doc>
            <field name="id">product01</field>
            <field name="name">car</field>
            <field name="content_type">product</field>
            <doc>
                <field name="id">part01</field>
                <field name="name">wheels</field>
                <field name="content_type">part</field>
            </doc>
            <doc>
                <field name="id">part02</field>
                <field name="name">engine</field>
                <field name="doctype">part</field>
            </doc>
            <doc>
                <field name="id">part03</field>
                <field name="name">brakes</field>
                <field name="content_type">part</field>
            </doc>
        </doc>
        <doc>
            <field name="id">product02</field>
            <field name="name">truck</field>
            <field name="content_type">product</field>
            <doc>
                <field name="id">part04</field>
                <field name="name">wheels</field>
                <field name="content_type">part</field>
            </doc>
            <doc>
                <field name="id">part05</field>
                <field name="name">flaps</field>
                <field name="doctype">part</field>
            </doc>
        </doc>

The list of docs after doing partial update of field read_flag for document
"product01":
<doc>
            <field name="id">product01</field>
            <field name="name">car</field>
            <field name="content_type">product</field>
            <field name="read_flag">true</field>
            <doc>
                <field name="id">part01</field>
                <field name="name">wheels</field>
                <field name="content_type">part</field>
            </doc>
            <doc>
                <field name="id">part02</field>
                <field name="name">engine</field>
                <field name="doctype">part</field>
            </doc>
            <doc>
                <field name="id">part03</field>
                <field name="name">brakes</field>
                <field name="content_type">part</field>
            </doc>
        </doc>
        <doc>
            <field name="id">product02</field>
            <field name="name">truck</field>
            <field name="content_type">product</field>
            <doc>
                <field name="id">part04</field>
                <field name="name">wheels</field>
                <field name="content_type">part</field>
            </doc>
            <doc>
                <field name="id">part05</field>
                <field name="name">flaps</field>
                <field name="doctype">part</field>
            </doc>
        </doc>

The list of documents after sending same documents again. (it should
overwrite on the last one because of duplicate IDs)
       <doc>
            <field name="id">product01</field>
            <field name="name">car</field>
            <field name="content_type">product</field>
            <field name="read_flag">true</field>
      </doc>
      <doc>
            <field name="id">product01</field>
            <field name="name">car</field>
            <field name="content_type">product</field>
            <doc>
                <field name="id">part01</field>
                <field name="name">wheels</field>
                <field name="content_type">part</field>
            </doc>
            <doc>
                <field name="id">part02</field>
                <field name="name">engine</field>
                <field name="doctype">part</field>
            </doc>
            <doc>
                <field name="id">part03</field>
                <field name="name">brakes</field>
                <field name="content_type">part</field>
            </doc>
        </doc>
        <doc>
            <field name="id">product02</field>
            <field name="name">truck</field>
            <field name="content_type">product</field>
            <doc>
                <field name="id">part04</field>
                <field name="name">wheels</field>
                <field name="content_type">part</field>
            </doc>
            <doc>
                <field name="id">part05</field>
                <field name="name">flaps</field>
                <field name="doctype">part</field>
            </doc>
        </doc>

But as you can see there are two different version of documents with the
same ID (which is product01).

Regards.

On Mon, Oct 6, 2014 at 8:18 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> Can you upload the update documents then (into a Gist or similar).
> Just so that people didn't have to re-imagine exact steps. Because, if
> it fully checks out, it might be a bug and the next step would be
> creating a JIRA ticket.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 6 October 2014 11:23, Ali Nazemian <al...@gmail.com> wrote:
> > Dear Alex,
> > Hi,
> > LOL, yeah I am sure. You can test it yourself. I did that on default
> schema
> > too. The results are same!
> > Regards.
> >
> > On Mon, Oct 6, 2014 at 4:20 PM, Alexandre Rafalovitch <
> arafalov@gmail.com>
> > wrote:
> >
> >> A stupid question: Are you sure that what schema thinks your uniqueId
> >> is - is the uniqueId in your setup? Also, that you are not somehow
> >> using the flags to tell Solr to ignore duplicates?
> >>
> >> Regards,
> >>    Alex.
> >> Personal: http://www.outerthoughts.com/ and @arafalov
> >> Solr resources and newsletter: http://www.solr-start.com/ and
> @solrstart
> >> Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> >>
> >>
> >> On 6 October 2014 03:40, Ali Nazemian <al...@gmail.com> wrote:
> >> > Dear all,
> >> > Hi,
> >> > I am going to do partial update on a field that has not any value.
> >> Suppose
> >> > I have a document with document id (unique key) '12345' and field
> >> > "read_flag" which does not index at the first place. So the read_flag
> >> field
> >> > for this document has not any value. After I did partial update to
> this
> >> > document to set "read_flag"="true", I faced strange problem. Next
> time I
> >> > indexed same document with same values I saw two different version of
> >> > document with id '12345' in solr. One of them with read_flag=true and
> >> > another one without read_flag field! I dont want to have duplicate
> >> > documents (as it should not to be because of unique_key id). Would you
> >> > please tell me what caused such problem?
> >> > Best regards.
> >> >
> >> > --
> >> > A.Nazemian
> >>
> >
> >
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Re: duplicate unique key after partial update in solr 4.10

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Can you upload the update documents then (into a Gist or similar).
Just so that people didn't have to re-imagine exact steps. Because, if
it fully checks out, it might be a bug and the next step would be
creating a JIRA ticket.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 6 October 2014 11:23, Ali Nazemian <al...@gmail.com> wrote:
> Dear Alex,
> Hi,
> LOL, yeah I am sure. You can test it yourself. I did that on default schema
> too. The results are same!
> Regards.
>
> On Mon, Oct 6, 2014 at 4:20 PM, Alexandre Rafalovitch <ar...@gmail.com>
> wrote:
>
>> A stupid question: Are you sure that what schema thinks your uniqueId
>> is - is the uniqueId in your setup? Also, that you are not somehow
>> using the flags to tell Solr to ignore duplicates?
>>
>> Regards,
>>    Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 6 October 2014 03:40, Ali Nazemian <al...@gmail.com> wrote:
>> > Dear all,
>> > Hi,
>> > I am going to do partial update on a field that has not any value.
>> Suppose
>> > I have a document with document id (unique key) '12345' and field
>> > "read_flag" which does not index at the first place. So the read_flag
>> field
>> > for this document has not any value. After I did partial update to this
>> > document to set "read_flag"="true", I faced strange problem. Next time I
>> > indexed same document with same values I saw two different version of
>> > document with id '12345' in solr. One of them with read_flag=true and
>> > another one without read_flag field! I dont want to have duplicate
>> > documents (as it should not to be because of unique_key id). Would you
>> > please tell me what caused such problem?
>> > Best regards.
>> >
>> > --
>> > A.Nazemian
>>
>
>
>
> --
> A.Nazemian

Re: duplicate unique key after partial update in solr 4.10

Posted by Ali Nazemian <al...@gmail.com>.
Dear Alex,
Hi,
LOL, yeah I am sure. You can test it yourself. I did that on default schema
too. The results are same!
Regards.

On Mon, Oct 6, 2014 at 4:20 PM, Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> A stupid question: Are you sure that what schema thinks your uniqueId
> is - is the uniqueId in your setup? Also, that you are not somehow
> using the flags to tell Solr to ignore duplicates?
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 6 October 2014 03:40, Ali Nazemian <al...@gmail.com> wrote:
> > Dear all,
> > Hi,
> > I am going to do partial update on a field that has not any value.
> Suppose
> > I have a document with document id (unique key) '12345' and field
> > "read_flag" which does not index at the first place. So the read_flag
> field
> > for this document has not any value. After I did partial update to this
> > document to set "read_flag"="true", I faced strange problem. Next time I
> > indexed same document with same values I saw two different version of
> > document with id '12345' in solr. One of them with read_flag=true and
> > another one without read_flag field! I dont want to have duplicate
> > documents (as it should not to be because of unique_key id). Would you
> > please tell me what caused such problem?
> > Best regards.
> >
> > --
> > A.Nazemian
>



-- 
A.Nazemian

Re: duplicate unique key after partial update in solr 4.10

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
A stupid question: Are you sure that what schema thinks your uniqueId
is - is the uniqueId in your setup? Also, that you are not somehow
using the flags to tell Solr to ignore duplicates?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 6 October 2014 03:40, Ali Nazemian <al...@gmail.com> wrote:
> Dear all,
> Hi,
> I am going to do partial update on a field that has not any value. Suppose
> I have a document with document id (unique key) '12345' and field
> "read_flag" which does not index at the first place. So the read_flag field
> for this document has not any value. After I did partial update to this
> document to set "read_flag"="true", I faced strange problem. Next time I
> indexed same document with same values I saw two different version of
> document with id '12345' in solr. One of them with read_flag=true and
> another one without read_flag field! I dont want to have duplicate
> documents (as it should not to be because of unique_key id). Would you
> please tell me what caused such problem?
> Best regards.
>
> --
> A.Nazemian