You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Scheffler <th...@uni-jena.de> on 2014/05/19 08:36:51 UTC

trigger delete on nested documents

Hi,

I plan to use nested documents to group some of my fields

<doc>
<field name="id">art0001</field>
<field name="title">My first article</field>
   <doc>
     <field name="id">art0001-foo</field>
     <field name="name">Smith, John</field>
     <field name="role">author</field>
   </doc>
   <doc>
     <field name="id">art0001-bar</field>
     <field name="name">Power, Max</field>
     <field name="role">reviewer</field>
   </doc>
</doc>

This way can ask for any documents that are reviewed by Max Power. 
However to simplify update and deletes I want to ensure that nested 
documents are deleted automatically on update and delete of the parent 
document.
Does anyone had to deal with this problem and found a solution?

regards,

Thomas

Re: trigger delete on nested documents

Posted by Thomas Scheffler <th...@uni-jena.de>.
Am 19.05.2014 08:38, schrieb Walter underwood:
> Solr does not support nested documents.  -- wunder

It does since 4.5:

http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/common/SolrInputDocument.html#addChildDocuments(java.util.Collection)

But this feature is rather poor documented and has some caveats:

http://blog.griddynamics.com/2013/09/solr-block-join-support.html

regards,

Thomas

>> On May 18, 2014, at 11:36 PM, Thomas Scheffler
>> <th...@uni-jena.de> wrote:
>>
>> Hi,
>>
>> I plan to use nested documents to group some of my fields
>>
>> <doc> <field name="id">art0001</field> <field name="title">My first
>> article</field> <doc> <field name="id">art0001-foo</field> <field
>> name="name">Smith, John</field> <field name="role">author</field>
>> </doc> <doc> <field name="id">art0001-bar</field> <field
>> name="name">Power, Max</field> <field name="role">reviewer</field>
>> </doc> </doc>
>>
>> This way can ask for any documents that are reviewed by Max Power.
>> However to simplify update and deletes I want to ensure that nested
>> documents are deleted automatically on update and delete of the
>> parent document. Does anyone had to deal with this problem and
>> found a solution?

Re: trigger delete on nested documents

Posted by Walter underwood <wu...@wunderwood.org>.
Solr does not support nested documents.  -- wunder

> On May 18, 2014, at 11:36 PM, Thomas Scheffler <th...@uni-jena.de> wrote:
> 
> Hi,
> 
> I plan to use nested documents to group some of my fields
> 
> <doc>
> <field name="id">art0001</field>
> <field name="title">My first article</field>
>  <doc>
>    <field name="id">art0001-foo</field>
>    <field name="name">Smith, John</field>
>    <field name="role">author</field>
>  </doc>
>  <doc>
>    <field name="id">art0001-bar</field>
>    <field name="name">Power, Max</field>
>    <field name="role">reviewer</field>
>  </doc>
> </doc>
> 
> This way can ask for any documents that are reviewed by Max Power. However to simplify update and deletes I want to ensure that nested documents are deleted automatically on update and delete of the parent document.
> Does anyone had to deal with this problem and found a solution?
> 
> regards,
> 
> Thomas

Re: trigger delete on nested documents

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Tomas, I left a few comments for particular cases at
https://issues.apache.org/jira/browse/SOLR-6096 and really want to follow
up your issues... Y U NO TXT ME BACK??>..


On Tue, May 20, 2014 at 4:36 PM, Thomas Scheffler <
thomas.scheffler@uni-jena.de> wrote:

> Am 20.05.2014 14:11, schrieb Jack Krupansky:
>
>  To be clear, you cannot update a single document of a nested document
>> in place - you must reindex the whole block, parent and all children.
>> This is because this feature relies on the underlying Lucene block
>> join feature that requires that the documents be contiguous, and
>> updating a single child document would make it discontiguous with the
>> rest of the block of documents.
>>
>> Just update the block by resending the entire block of documents.
>>
>> For e previous discussion of this limitation:
>> http://lucene.472066.n3.nabble.com/block-join-and-
>> atomic-updates-td4117178.html
>>
>
> This is totally clear to me and I want nested document to not be
> accessible without it's root context.
>
> There is no way it seems to delete the whole block by the id of the root
> document. There is no way to update the root document that removes the
> stale date from the index. Normal SOLR behavior is to automatically delete
> old documents with same ID. I expect this behavior for other documents in
> this block to.
>
> Anyway to make things clear I issued a JIRA request and tried to explain
> it more carefully there:
>
> https://issues.apache.org/jira/browse/SOLR-6096
>
> regards
>
> Thomas
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: trigger delete on nested documents

Posted by Thomas Scheffler <th...@uni-jena.de>.
Am 20.05.2014 14:11, schrieb Jack Krupansky:
> To be clear, you cannot update a single document of a nested document
> in place - you must reindex the whole block, parent and all children.
> This is because this feature relies on the underlying Lucene block
> join feature that requires that the documents be contiguous, and
> updating a single child document would make it discontiguous with the
> rest of the block of documents.
>
> Just update the block by resending the entire block of documents.
>
> For e previous discussion of this limitation:
> http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html

This is totally clear to me and I want nested document to not be 
accessible without it's root context.

There is no way it seems to delete the whole block by the id of the root 
document. There is no way to update the root document that removes the 
stale date from the index. Normal SOLR behavior is to automatically 
delete old documents with same ID. I expect this behavior for other 
documents in this block to.

Anyway to make things clear I issued a JIRA request and tried to explain 
it more carefully there:

https://issues.apache.org/jira/browse/SOLR-6096

regards

Thomas

Re: trigger delete on nested documents

Posted by Jack Krupansky <ja...@basetechnology.com>.
To be clear, you cannot update a single document of a nested document in 
place - you must reindex the whole block, parent and all children. This is 
because this feature relies on the underlying Lucene block join feature that 
requires that the documents be contiguous, and updating a single child 
document would make it discontiguous with the rest of the block of 
documents.

Just update the block by resending the entire block of documents.

For e previous discussion of this limitation:
http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html

-- Jack Krupansky

-----Original Message----- 
From: Thomas Scheffler
Sent: Tuesday, May 20, 2014 4:27 AM
To: solr-user@lucene.apache.org
Subject: Re: trigger delete on nested documents

Am 19.05.2014 19:25, schrieb Mikhail Khludnev:
> Thomas,
>
> Vanilla way to override a blocks is to send it with the same unique-key (I
> guess it's "id" for your case, btw don't you have unique-key defined in 
> the
> schema?), but it must have at least one child. It seems like analysis 
> issue
> to me https://issues.apache.org/jira/browse/SOLR-5211
>
> While block is indexed the special field _root_ equal to the <unique-key>
> is added across the whole block (caveat, it's not stored by default). At
> least you can issue
>
> <delete><query>_root_:PK_VAL</query></delete>
>
> to wipe the whole block.

Thank you for your insight information. It sure helps a lot in
understanding. The '_root_' field was new to me on this rather poor
documented feature of SOLR. It helps already if I perform single updates
and deletes from the index. BUT:

If I delete by a query this results in a mess:

1.) request all IDs returned by that query
2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR
.. OR idn)"

Before every update of single documents I have to fire a delete request.

This turns into a mess, when updating in batch mode:
1.) remove chunk of 100 documents and nested documents (see above)
2.) index chunk of 100 documents

All information for that is available on SOLR side. Can I configure some
hook that is executed on SOLR-Server so that I do not have to change all
applications? This would at least save these extra network transfers.

After big work to migrate from plain Lucene to SOLR I really require
proper nested document support. Elastic Search seems to support it
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html)
but I am afraid of another migration. Elastic Search even hides the
nested documents at queries which seems nice, too.

Does anyone have information how nested document support evolve in
future releases of SOLR?

kind regards,

Thomas


>
> 19.05.2014 10:37 пользователь "Thomas Scheffler" <
> thomas.scheffler@uni-jena.de> написал:
>
>> Hi,
>>
>> I plan to use nested documents to group some of my fields
>>
>> <doc>
>> <field name="id">art0001</field>
>> <field name="title">My first article</field>
>>    <doc>
>>      <field name="id">art0001-foo</field>
>>      <field name="name">Smith, John</field>
>>      <field name="role">author</field>
>>    </doc>
>>    <doc>
>>      <field name="id">art0001-bar</field>
>>      <field name="name">Power, Max</field>
>>      <field name="role">reviewer</field>
>>    </doc>
>> </doc>
>>
>> This way can ask for any documents that are reviewed by Max Power. 
>> However
>> to simplify update and deletes I want to ensure that nested documents are
>> deleted automatically on update and delete of the parent document.
>> Does anyone had to deal with this problem and found a solution? 


Re: trigger delete on nested documents

Posted by Thomas Scheffler <th...@uni-jena.de>.
Am 19.05.2014 19:25, schrieb Mikhail Khludnev:
> Thomas,
>
> Vanilla way to override a blocks is to send it with the same unique-key (I
> guess it's "id" for your case, btw don't you have unique-key defined in the
> schema?), but it must have at least one child. It seems like analysis issue
> to me https://issues.apache.org/jira/browse/SOLR-5211
>
> While block is indexed the special field _root_ equal to the <unique-key>
> is added across the whole block (caveat, it's not stored by default). At
> least you can issue
>
> <delete><query>_root_:PK_VAL</query></delete>
>
> to wipe the whole block.

Thank you for your insight information. It sure helps a lot in 
understanding. The '_root_' field was new to me on this rather poor 
documented feature of SOLR. It helps already if I perform single updates 
and deletes from the index. BUT:

If I delete by a query this results in a mess:

1.) request all IDs returned by that query
2.) fire a giant delete query with "id:(id1 OR .. OR idn) _root_:(id1 OR 
.. OR idn)"

Before every update of single documents I have to fire a delete request.

This turns into a mess, when updating in batch mode:
1.) remove chunk of 100 documents and nested documents (see above)
2.) index chunk of 100 documents

All information for that is available on SOLR side. Can I configure some 
hook that is executed on SOLR-Server so that I do not have to change all 
applications? This would at least save these extra network transfers.

After big work to migrate from plain Lucene to SOLR I really require 
proper nested document support. Elastic Search seems to support it 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) 
but I am afraid of another migration. Elastic Search even hides the 
nested documents at queries which seems nice, too.

Does anyone have information how nested document support evolve in 
future releases of SOLR?

kind regards,

Thomas


>
> 19.05.2014 10:37 пользователь "Thomas Scheffler" <
> thomas.scheffler@uni-jena.de> написал:
>
>> Hi,
>>
>> I plan to use nested documents to group some of my fields
>>
>> <doc>
>> <field name="id">art0001</field>
>> <field name="title">My first article</field>
>>    <doc>
>>      <field name="id">art0001-foo</field>
>>      <field name="name">Smith, John</field>
>>      <field name="role">author</field>
>>    </doc>
>>    <doc>
>>      <field name="id">art0001-bar</field>
>>      <field name="name">Power, Max</field>
>>      <field name="role">reviewer</field>
>>    </doc>
>> </doc>
>>
>> This way can ask for any documents that are reviewed by Max Power. However
>> to simplify update and deletes I want to ensure that nested documents are
>> deleted automatically on update and delete of the parent document.
>> Does anyone had to deal with this problem and found a solution?

Re: trigger delete on nested documents

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Thomas,

Vanilla way to override a blocks is to send it with the same unique-key (I
guess it's "id" for your case, btw don't you have unique-key defined in the
schema?), but it must have at least one child. It seems like analysis issue
to me https://issues.apache.org/jira/browse/SOLR-5211

While block is indexed the special field _root_ equal to the <unique-key>
is added across the whole block (caveat, it's not stored by default). At
least you can issue

<delete><query>_root_:PK_VAL</query></delete>

to wipe the whole block.

19.05.2014 10:37 пользователь "Thomas Scheffler" <
thomas.scheffler@uni-jena.de> написал:

> Hi,
>
> I plan to use nested documents to group some of my fields
>
> <doc>
> <field name="id">art0001</field>
> <field name="title">My first article</field>
>   <doc>
>     <field name="id">art0001-foo</field>
>     <field name="name">Smith, John</field>
>     <field name="role">author</field>
>   </doc>
>   <doc>
>     <field name="id">art0001-bar</field>
>     <field name="name">Power, Max</field>
>     <field name="role">reviewer</field>
>   </doc>
> </doc>
>
> This way can ask for any documents that are reviewed by Max Power. However
> to simplify update and deletes I want to ensure that nested documents are
> deleted automatically on update and delete of the parent document.
> Does anyone had to deal with this problem and found a solution?
>
> regards,
>
> Thomas
>