You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Sokolov <ms...@safaribooksonline.com> on 2014/12/17 22:33:31 UTC

converting to parent/child block indexing

Have other people tried migrating an index that was created without 
block (parent/child) indexing to one that *does* have it?  Did you find 
that you got duplicate documents - ie multiple documents with the same 
uniqueField value?  That's what I found, and I don't see how that's 
possible.

What I *think* happened was:

Before:

I had various documents in the database, the unique key field was all 
set up correctly, when I reindexed, documents would overwrite the 
existing document (delete, then update, I guess).

I changed my indexer (this is using a customized version of haystack) to 
submit nested document updates instead, so now some of the formerly 
"standalone" documents became child documents, and others, parents.

After reindexing:

I had double copies of all the documents; two documents with the same 
(uniqueField) id.  If I re-indexed again, the parent/child copies would 
be overwritten, but a second "standalone" copy seemed to persist (the 
_version_ was unchanged).  Is the uniqueId field not being applied to 
child documents somehow?

Pragmatically speaking it seems I just need to wipe the index and start 
over, but I wonder if that is expected?

-Mike

Re: converting to parent/child block indexing

Posted by Michael Sokolov <ms...@safaribooksonline.com>.
Thanks, Mikhail!  That explains the situation pretty well.

-Mike

On 12/17/14 4:49 PM, Mikhail Khludnev wrote:
> Hm.. really sorry about that. The current implementation is not really
> ideal, you know.
> When handles update it tries to recognize whether it block or not and in
> fact it uses _root_ field to enforce uniqueness. There are few consequences:
>   -  _root_ field spans whole block, not the parent one
>   - current heuristic (block/not-block) is straightforward and doesn't
> support flipping. Index is either blocked or it's not blocked. Your case is
> opposite to https://issues.apache.org/jira/browse/SOLR-5211
> as a workaround, before you send a block with id:66, send deleteQuery for
> this id. Wiping the index also an option for sure!
> Note, this might be an other intriguing approach
> http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html
>
>
> On Thu, Dec 18, 2014 at 12:33 AM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>> Have other people tried migrating an index that was created without block
>> (parent/child) indexing to one that *does* have it?  Did you find that you
>> got duplicate documents - ie multiple documents with the same uniqueField
>> value?  That's what I found, and I don't see how that's possible.
>>
>> What I *think* happened was:
>>
>> Before:
>>
>> I had various documents in the database, the unique key field was all set
>> up correctly, when I reindexed, documents would overwrite the existing
>> document (delete, then update, I guess).
>>
>> I changed my indexer (this is using a customized version of haystack) to
>> submit nested document updates instead, so now some of the formerly
>> "standalone" documents became child documents, and others, parents.
>>
>> After reindexing:
>>
>> I had double copies of all the documents; two documents with the same
>> (uniqueField) id.  If I re-indexed again, the parent/child copies would be
>> overwritten, but a second "standalone" copy seemed to persist (the
>> _version_ was unchanged).  Is the uniqueId field not being applied to child
>> documents somehow?
>>
>> Pragmatically speaking it seems I just need to wipe the index and start
>> over, but I wonder if that is expected?
>>
>> -Mike
>>
>


Re: converting to parent/child block indexing

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hm.. really sorry about that. The current implementation is not really
ideal, you know.
When handles update it tries to recognize whether it block or not and in
fact it uses _root_ field to enforce uniqueness. There are few consequences:
 -  _root_ field spans whole block, not the parent one
 - current heuristic (block/not-block) is straightforward and doesn't
support flipping. Index is either blocked or it's not blocked. Your case is
opposite to https://issues.apache.org/jira/browse/SOLR-5211
as a workaround, before you send a block with id:66, send deleteQuery for
this id. Wiping the index also an option for sure!
Note, this might be an other intriguing approach
http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html


On Thu, Dec 18, 2014 at 12:33 AM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:
>
> Have other people tried migrating an index that was created without block
> (parent/child) indexing to one that *does* have it?  Did you find that you
> got duplicate documents - ie multiple documents with the same uniqueField
> value?  That's what I found, and I don't see how that's possible.
>
> What I *think* happened was:
>
> Before:
>
> I had various documents in the database, the unique key field was all set
> up correctly, when I reindexed, documents would overwrite the existing
> document (delete, then update, I guess).
>
> I changed my indexer (this is using a customized version of haystack) to
> submit nested document updates instead, so now some of the formerly
> "standalone" documents became child documents, and others, parents.
>
> After reindexing:
>
> I had double copies of all the documents; two documents with the same
> (uniqueField) id.  If I re-indexed again, the parent/child copies would be
> overwritten, but a second "standalone" copy seemed to persist (the
> _version_ was unchanged).  Is the uniqueId field not being applied to child
> documents somehow?
>
> Pragmatically speaking it seems I just need to wipe the index and start
> over, but I wonder if that is expected?
>
> -Mike
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mk...@griddynamics.com>