You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Sokolov <ms...@safaribooksonline.com> on 2014/12/17 22:33:31 UTC
converting to parent/child block indexing
Have other people tried migrating an index that was created without
block (parent/child) indexing to one that *does* have it? Did you find
that you got duplicate documents - ie multiple documents with the same
uniqueField value? That's what I found, and I don't see how that's
possible.
What I *think* happened was:
Before:
I had various documents in the database, the unique key field was all
set up correctly, when I reindexed, documents would overwrite the
existing document (delete, then update, I guess).
I changed my indexer (this is using a customized version of haystack) to
submit nested document updates instead, so now some of the formerly
"standalone" documents became child documents, and others, parents.
After reindexing:
I had double copies of all the documents; two documents with the same
(uniqueField) id. If I re-indexed again, the parent/child copies would
be overwritten, but a second "standalone" copy seemed to persist (the
_version_ was unchanged). Is the uniqueId field not being applied to
child documents somehow?
Pragmatically speaking it seems I just need to wipe the index and start
over, but I wonder if that is expected?
-Mike
Re: converting to parent/child block indexing
Posted by Michael Sokolov <ms...@safaribooksonline.com>.
Thanks, Mikhail! That explains the situation pretty well.
-Mike
On 12/17/14 4:49 PM, Mikhail Khludnev wrote:
> Hm.. really sorry about that. The current implementation is not really
> ideal, you know.
> When handles update it tries to recognize whether it block or not and in
> fact it uses _root_ field to enforce uniqueness. There are few consequences:
> - _root_ field spans whole block, not the parent one
> - current heuristic (block/not-block) is straightforward and doesn't
> support flipping. Index is either blocked or it's not blocked. Your case is
> opposite to https://issues.apache.org/jira/browse/SOLR-5211
> as a workaround, before you send a block with id:66, send deleteQuery for
> this id. Wiping the index also an option for sure!
> Note, this might be an other intriguing approach
> http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html
>
>
> On Thu, Dec 18, 2014 at 12:33 AM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>> Have other people tried migrating an index that was created without block
>> (parent/child) indexing to one that *does* have it? Did you find that you
>> got duplicate documents - ie multiple documents with the same uniqueField
>> value? That's what I found, and I don't see how that's possible.
>>
>> What I *think* happened was:
>>
>> Before:
>>
>> I had various documents in the database, the unique key field was all set
>> up correctly, when I reindexed, documents would overwrite the existing
>> document (delete, then update, I guess).
>>
>> I changed my indexer (this is using a customized version of haystack) to
>> submit nested document updates instead, so now some of the formerly
>> "standalone" documents became child documents, and others, parents.
>>
>> After reindexing:
>>
>> I had double copies of all the documents; two documents with the same
>> (uniqueField) id. If I re-indexed again, the parent/child copies would be
>> overwritten, but a second "standalone" copy seemed to persist (the
>> _version_ was unchanged). Is the uniqueId field not being applied to child
>> documents somehow?
>>
>> Pragmatically speaking it seems I just need to wipe the index and start
>> over, but I wonder if that is expected?
>>
>> -Mike
>>
>
Re: converting to parent/child block indexing
Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Hm.. really sorry about that. The current implementation is not really
ideal, you know.
When handles update it tries to recognize whether it block or not and in
fact it uses _root_ field to enforce uniqueness. There are few consequences:
- _root_ field spans whole block, not the parent one
- current heuristic (block/not-block) is straightforward and doesn't
support flipping. Index is either blocked or it's not blocked. Your case is
opposite to https://issues.apache.org/jira/browse/SOLR-5211
as a workaround, before you send a block with id:66, send deleteQuery for
this id. Wiping the index also an option for sure!
Note, this might be an other intriguing approach
http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html
On Thu, Dec 18, 2014 at 12:33 AM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:
>
> Have other people tried migrating an index that was created without block
> (parent/child) indexing to one that *does* have it? Did you find that you
> got duplicate documents - ie multiple documents with the same uniqueField
> value? That's what I found, and I don't see how that's possible.
>
> What I *think* happened was:
>
> Before:
>
> I had various documents in the database, the unique key field was all set
> up correctly, when I reindexed, documents would overwrite the existing
> document (delete, then update, I guess).
>
> I changed my indexer (this is using a customized version of haystack) to
> submit nested document updates instead, so now some of the formerly
> "standalone" documents became child documents, and others, parents.
>
> After reindexing:
>
> I had double copies of all the documents; two documents with the same
> (uniqueField) id. If I re-indexed again, the parent/child copies would be
> overwritten, but a second "standalone" copy seemed to persist (the
> _version_ was unchanged). Is the uniqueId field not being applied to child
> documents somehow?
>
> Pragmatically speaking it seems I just need to wipe the index and start
> over, but I wonder if that is expected?
>
> -Mike
>
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mk...@griddynamics.com>