You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2012/06/11 22:59:42 UTC

[jira] [Created] (SOLR-3535) Add block support for XMLLoader

Mikhail Khludnev created SOLR-3535:
--------------------------------------

             Summary: Add block support for XMLLoader
                 Key: SOLR-3535
                 URL: https://issues.apache.org/jira/browse/SOLR-3535
             Project: Solr
          Issue Type: Sub-task
          Components: update
    Affects Versions: 4.1, 5.0
            Reporter: Mikhail Khludnev
            Priority: Minor


I'd like to add the following update xml message:

<add-block>
    <doc>....</doc>
    <doc>....</doc>
</add-block>

out of scope for now: 
* other update formats
* update log support (NRT), should not be a big deal
* overwrite feature support for block updates - it's more complicated, I'll tell you why

Alt
* wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
* or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.

*Test is included!!*
How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294569#comment-13294569 ] 

Hoss Man commented on SOLR-3535:
--------------------------------

bq. The necessity to treat multiple docs as a single update introduce complexity into the update processor chain regardless.

Exactly.  No matter how we deal with this sort of thing in the external (xml/json/etc) APIs, or in the internal (SolrInputDocument) APIs, the UpdateRequestProcessors are going to need to be changed to explicitly understand the relationsihps of these docs -- so let's model things in the way that makes the most sense and work from there -- with the added bonus that modeling things the way they make the most sense should also be the easiest way to make it work with SolrCloud.

My suggestion for an order of iterative implementation:

1) add "List<SolrInputDocument> getChildDocuments()" to SOlrInputDocument
2) make RunUpdateProcessor do the right thing with child docs
3) make the JavaBinCodec aware of getChildDocuments() so solrj can serialize/deserialize (which should means SolrCloud can propogate them transparently)
4) get basic tests of hierarchical doc updates/deletes working in both standalone and solrcloud mode

Then lots of other stuff can be done in parallel and doesn't gate each other...

* syntax in various loaders
** XML
** json
** DIH entities
* change simple update processors to know about nested docs (ie: field mutators)
* add new options/processors for more complex update processor use cases (ie: we'll probably want SignatureUpdateProcessor to be able to do smething with the nested docs, etc...)

...but the bottom line is all of that stuff -- even the XML syntax -- is really secondary to understanding the right way to deal with it in the internal APIs, and in my opinion that's modeling as a true hierarchy in the SolrInputDocument class.

                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294678#comment-13294678 ] 

Hoss Man commented on SOLR-3535:
--------------------------------

bq. I don't feel that this rich model is covered with single level parent-child well.

who said anything about a "single level" ? .. if SolrInputDocument can have a List<SolrInputDocument> of children, then those children can have other children, etc..

bq. PK field is a blocker for transparent handling scoped docs by the current processors. i.e. I don't think it's mandatory to provide PK field for every child document (most time it's useless and redundant info)

Agreed, but i don't see how it's a blocker - if the the children hang off of the top most parent, then as long as that parent has a uniqueKey, all of the distributed stuff (and any update processors that care about uniqueKey) should be fine ... processors that want to be aware of sub-documents might have to worry about it, and we have to think through how deletes by id should work (so that children are automaticly removed and not inherited by the ajacent parent doc) but those are going to issues that need thought through/solved regardless of how we model the nested docs in the processor chain API.

bq. field update processors can work wrong if the same field name is present in several scopes - name clash between different relations/scopes

a) that seems like an argument in favor of continuing to give the processors a single top level SolrInputDocument with all of it's children hanging off of it in a hierarchy, instead of adding a new AddBlockCommand that contains an flatened list of documents -- because the processors won't have any way of knowing if/when to treat some docs differently.

b) like other things i mentioned earlier, that really seems like a secondary concern -- for many use cases either the fields names will be distinct, or can be made distinct for the purposes of using this feature.  Update processors can (eventually) be made smarter to know to only operate on certain documents by "type" but any solution like that that would work on a sequential list of documents like in your "AddBlockCommand" suggestion could also work on a true hierarchy of SOlrInputDocuments (where it would have the acutal hierarchy to help inform it's behavior)

bq. why new api/property is necessary? is solrInputDoc.addField("skus", new Object[]{sku1, sku2, sku3}) not enough?

Are you suggesting we model child documents as objects (SolrInputDocuments i guess?) in a special field? ... what if i put child documents in multiple fields? would that signify the different types of child?  how would solr model that in the (lucene) Documents when giving them to the InddexWriter?  How would solr know how to order the children in from multiple fields/lists when creating the block?  Wouldn't the "type of child" information be better living in the child documents itself?  (particularly since that "type" information needs to be in the child documents anyway so that the filter query for a BJQ can be specified.)  

It also seems like it would require code that wants to know what children exist in a document to do a lot of work to find that out (need to iterate ever field in the SolrInputDocument and do reflection to see if they are child-documents or not)

Another concern off the top of my head is that a lot of existing code (including any custom update processors people might have) would assume those child documents are multivaluved field values and would probably break -- hence a new method on SolrInputDocument seems wiser (code that doens't know about may not do what you want, but at least it won't break it)

bq. there is a *pre*processors chain which deal with scoped documents and flatten them - there should be two of them: block-join (bjq counterpart); denormalizer (grouping counterpart); fk-copier for query-time join;

i don't really understand the need for this.  i'm at a complete loss as what you mean by "fk-copier for query-time join", but your suggestion for a new type of processor chain that can flatten/denormalize documents seems like it could easily be implemented using the existing UpdateProcessorChain code -- assuming we let SolrInputDocuments have other SolrInputDocuments as children.  Couldn't you just write a new "FlattenDocumentUpdateProcessor" such that anytime it gets a SolrInputDocument with children, it creates new AddDocCommands containing those children (adding whatever flattened fields from the parent that it wants) and executes them?

bq. for distributed processor AddBlockCommand should have PK - it's preprocessors' duty

but that doesn't address the issues yonik and i raised about all of the distributed update & transaction log code that already exists revolving around forwarding *documents* and recording their unique key.  What is the advantage of introducing a new AddBlockCommand that also has to have a unique key, and would need to be forwarded around atomically when we could just use the top level parent document with all of the existing distributed update code as is?
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294650#comment-13294650 ] 

Mikhail Khludnev commented on SOLR-3535:
----------------------------------------

assuming that at the hight level of abstraction app deal with multiple levels of nesting with multiple relations: model has several SKUs/UPCs and several Discounts with regional and temporal attributes. 
* I don't feel that this rich model is covered with single level parent-child well.
* Don't you want to provide two ways to deal with relation content: index time block join and traditional join/grouping with FK fields and denormalization?
* PK field is a blocker for transparent handling scoped docs by the current processors. i.e. I don't think it's mandatory to provide PK field for every child document (most time it's useless and redundant info)
* field update processors can work wrong if the same field name is present in several scopes - name clash between different relations/scopes 
* why new api/property is necessary? is solrInputDoc.addField("skus", new Object[]{sku1, sku2, sku3}) not enough?

I propose the following design: 
* there is a *pre*processors chain which deal with scoped documents and flatten them - there should be two of them: block-join (bjq counterpart); denormalizer (grouping counterpart); fk-copier for query-time join;   
* update processors can handle AddUpdateCommand and AddBlockCommand as well (kind of default loop behaviour can be supplied in abstract class)
* for distributed processor AddBlockCommand should have PK - it's preprocessors' duty
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-3535:
-----------------------------------

    Attachment: SOLR-3535.patch

new attach highlights:
* UpdReqProc.processAdd() become protected
* FlattenerUpdateProcessorFactory has been introduced. 
 * it transforms the given SolrInputDocument with nested subdocs into block of SIDs  
 * i.e. it transforms AddUpdCmd to AddBlockUpdCmd
 * nested subdocs are placed as collection of SID fields. Hoss, excuse me. It's not really my point - we can later switch to getChildrenDocs(). it just seems to me easier for now. Let's polish it later.
 * you can see that Flattener is placed between Distributed and Log/Run. i.e. I addressed your guys point - make it compatible with distributed update magic.  
 * forgot to cover tree levels of nesting, my falut. it's trivial transitive closure via recursion. I'll switch to iteration further.
* XMLLoader supports <field name="skills"><doc>..</doc><doc>..</doc><doc>..</doc><doc>..</doc></field> it's a nested docs. 
* there are tests, puzzling a little. test harness is harmed a little. 

Please have a look, and raise your concerns! 
btw it might be easier https://github.com/m-khl/solr-patches/commits/blockupdate
 





                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch, SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294313#comment-13294313 ] 

Ryan McKinley commented on SOLR-3535:
-------------------------------------

I think the suggestion is to make nestable SolrInputDocuments.  In XML this would be something like:
{code:xml}
<add>

<doc>
  <field name="id">1</field>
  <field name="name">Parent</field>
  <doc>
    <field name="id">2</field>
    <field name="name">Child 1</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="name">Child 2</field>
  </doc>
</doc>

</add>

{code}



                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295049#comment-13295049 ] 

Yonik Seeley commented on SOLR-3535:
------------------------------------

bq. 1) add "List<SolrInputDocument> getChildDocuments()" to SOlrInputDocument

Or simply allow SolrInputDocument *as* a normal value and existing APIs could be used to add them.
This would also be slightly more powerful, allowing more than one child list for the same parent.

                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294293#comment-13294293 ] 

Mikhail Khludnev commented on SOLR-3535:
----------------------------------------

Yonik,

do you mean XMLLoader (as well as other ones) should produce some form of hierarchical document and some update processor will linearize such hierarchy into the sequence and pass it into AddBlockCommand?

also how do you suggest to model doc hierarchy: as a SolrInputDoc subclass with explicit relations collections or some magic collections values in the current SolrInputDocument? 

I'd like to emphasize the overall complexity - parent doc can have _several_ subdocs relations like SKUs/UPCs and Discounts, etc   

PS pls check the parent issue SOLR-3076. there is a dilemma which feature set to provide, your proposal is closer to "magic-knows-everything" schema approach. I don't have strong disagreement about it, but just want to start from pretty neat ability first. Anyway, looking forward to hear your suggestions. 

                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294500#comment-13294500 ] 

Yonik Seeley commented on SOLR-3535:
------------------------------------

bq. At first blush, I think we'd want a single SolrInputDocument passing through the update processor chain AddUpdateCommand, and that document would be nested.  You'd need to keep that entire nested structure as a single unit for as long as possible I think.

Yes, exactly.  That also gives the most power... update processors that care about the structure of the nested documents can get to it.

bq. But of course there are cons.... update processors would need to coded to handle nested documents explicitly,

The necessity to treat multiple docs as a single update introduce complexity into the update processor chain regardless.  If not a nested SolrInputDocument, then we'd need to pass along a List<SolrInputDocument> to keep them together anyway, and document mutating processors would need to change to iterate over this list.  For simple mutating processors, we can have a utility class that visits each document.

Of course the biggest benefit of treating as a single document means that all the solr cloud stuff we've done (transaction log recovery, peer sync, update forwarding, per-doc replication, etc, should all just work)
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403367#comment-13403367 ] 

Mikhail Khludnev commented on SOLR-3535:
----------------------------------------

Could somebody review the last patch? if introducing getChildrenDocs() is the only a blocker i'll move on it
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch, SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294348#comment-13294348 ] 

Ryan McKinley commented on SOLR-3535:
-------------------------------------

off hand, the big complication I see is how to deal with UpdateProcesssorChains -- should processors expect the nested SolrInputDoc or a flattened version?
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Simon Rosenthal (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293761#comment-13293761 ] 

Simon Rosenthal commented on SOLR-3535:
---------------------------------------

Mikhail:
not clear to me from the code/comments exactly what this issue/patch is meant to accomplish. I'm assuming that the intention is to be able to add atomically every document in the block at once ?

That is a use case which I have encountered  (a batch update of a set of records with new product price information, where you want to commit them only when the complete set has been indexed, regardless of autocommits being fired off or other processes issuing commits). If that's the intention, this patch is great !

I attempted to address the problem of undesired autocommits  in SOLR-2664 - enable/disable autocommit on the fly, but that patch is very out of date.

I do think it should be extended to updates in CSV/JSON and updates using the SolrJ API.

+1 for Erik's suggestion on the syntax.


                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Comment Edited] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293858#comment-13293858 ] 

Mikhail Khludnev edited comment on SOLR-3535 at 6/12/12 7:15 PM:
-----------------------------------------------------------------

@Simon,
the intention of this patch is index support for the parent ticket SOLR-3076. BJQ magic is explained at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I'm going to rework the patch by this week.  
                
      was (Author: mkhludnev):
    @Simon,
the intention of this patch is index support for the parent ticket SOLR-3076. BJQ magic is explained at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I'm going to rework the path by this week.  
                  
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293858#comment-13293858 ] 

Mikhail Khludnev commented on SOLR-3535:
----------------------------------------

@Simon,
the intention of this patch is index support for the parent ticket SOLR-3076. BJQ magic is explained at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

I'm going to rework the path by this week.  
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294466#comment-13294466 ] 

Erik Hatcher commented on SOLR-3535:
------------------------------------

bq. off hand, the big complication I see is how to deal with UpdateProcesssorChains – should processors expect the nested SolrInputDoc or a flattened version?

At first blush, I think we'd want a single SolrInputDocument passing through the update processor chain AddUpdateCommand, and that document would be nested.  You'd need to keep that entire nested structure as a single unit for as long as possible I think.  

But of course there are cons.... update processors would need to coded to handle nested documents explicitly, as currently something like a language detection would only operate on the outer parent document automatically.  Hmmm.  
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293924#comment-13293924 ] 

Erik Hatcher commented on SOLR-3535:
------------------------------------

bq. It seems like what we really want to express here is nested documents.

Great point, and totally concur that the input should be hierarchical for the block join queries.  But do we also need a little bit lower level direct (non-hierarchical) way call IndexWriter#addDocuments()?  Or is the Solr need here purely on hierarchy modeling?
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (SOLR-3535) Add block support for XMLLoader

Posted by "Mikhail Khludnev (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Khludnev updated SOLR-3535:
-----------------------------------

    Attachment: SOLR-3535.patch
    
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295189#comment-13295189 ] 

Hoss Man commented on SOLR-3535:
--------------------------------

bq. Or simply allow SolrInputDocument as a normal value and existing APIs could be used to add them.  This would also be slightly more powerful, allowing more than one child list for the same parent.

"allow SolrInputDocument as a normal value" ... a normal value to what? where? ... are you describing he same thing as Mikhail in modeling "children" SolrInputDocuments as field values of the parent SOlrInputDOcument?

If so then i ask you the same questions i asked him above...

{quote}
bq. why new api/property is necessary? is solrInputDoc.addField("skus", new Object[]\{sku1, sku2, sku3\}) not enough?

Are you suggesting we model child documents as objects (SolrInputDocuments i guess?) in a special field? ... what if i put child documents in multiple fields? would that signify the different types of child? how would solr model that in the (lucene) Documents when giving them to the InddexWriter? How would solr know how to order the children in from multiple fields/lists when creating the block? Wouldn't the "type of child" information be better living in the child documents itself? (particularly since that "type" information needs to be in the child documents anyway so that the filter query for a BJQ can be specified.)

It also seems like it would require code that wants to know what children exist in a document to do a lot of work to find that out (need to iterate ever field in the SolrInputDocument and do reflection to see if they are child-documents or not)

Another concern off the top of my head is that a lot of existing code (including any custom update processors people might have) would assume those child documents are multivaluved field values and would probably break – hence a new method on SolrInputDocument seems wiser (code that doens't know about may not do what you want, but at least it won't break it)
{quote}
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293867#comment-13293867 ] 

Yonik Seeley commented on SOLR-3535:
------------------------------------

It seems like what we really want to express here is nested documents.  Directly expressing that in the transfer syntax (XML, JSON, or binary) would seem more natural and also allow us to handle/express multiple levels of nesting.  This also frees the user from having to think about details such as where the parent document goes (at the beginning or the end?).

Internally representing a parent and it's child documents as a single SolrInputDocument also has a lot of benefits and seems like it's the easiest path to get this working with all of the existing code (like transaction logging, forwarding docs based on ID in cloud mode, etc).


                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Erik Hatcher (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293101#comment-13293101 ] 

Erik Hatcher commented on SOLR-3535:
------------------------------------

bq. wdyt about adding attribute to the current tag {pre}<add block="true">{pre}

I like this better than coming up with a new <add-block> element, personally.
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Jack Krupansky (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294504#comment-13294504 ] 

Jack Krupansky commented on SOLR-3535:
--------------------------------------

Maybe there should be a "block-doc-aware" interface/base class that update processors can implement or extend. If an update processor does implement that interface then it can be passed the new hierarchical doc/list in one shot. But for "legacy" update processors (that don't implement "block-doc-aware") each of the sub-documents and the parent document would get passed one at a time.

                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (SOLR-3535) Add block support for XMLLoader

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295680#comment-13295680 ] 

Yonik Seeley commented on SOLR-3535:
------------------------------------

bq. If so then i ask you the same questions i asked him above...

We don't necessarily need to have all the answers today (about how modeling in the index would work)... we just need to realize that we may want that extra power / generality in the future.  Don't think just in terms of BJQ - there will be other ways to use the fact that we know a sequence of related docs are contiguous.

bq. what if i put child documents in multiple fields?

This is what I think we should support to future-proof this.

bq. would that signify the different types of child?

Yes.

bq. how would solr model that in the (lucene) Documents when giving them to the InddexWriter?

We could index a "type" field based on the field name (among other possibilities including allowing the user to specify type).

bq. How would solr know how to order the children in from multiple fields/lists when creating the block?

It hopefully shouldn't matter.  child_list1, child_list2, parent

bq. Wouldn't the "type of child" information be better living in the child documents itself? (particularly since that "type" information needs to be in the child documents anyway so that the filter query for a BJQ can be specified.)

Yes, but it's not that clear to me how this is related to allowing documents as field values.

bq. It also seems like it would require code that wants to know what children exist in a document to do a lot of work to find that out (need to iterate ever field in the SolrInputDocument and do reflection to see if they are child-documents or not)

Yeah, this is what bothers me the most.
The code complexity part shouldn't be an issue - that can be wrapped up into a utility class that visits all child documents.  But I don't particularly like the fact that we need to go through looking at every field value each time.  On the other hand, I also wouldn't like to duplicate the get/set API for child documents on SolrInputDocument.

bq. Another concern off the top of my head is that a lot of existing code (including any custom update processors people might have) would assume those child documents are multivaluved field values and would probably break

I don't think we should worry too much about custom update processors - it's (IMO) more of an expert-level thing, this is 4.0, and most document mutating processors I can think of probably also want to operate on child docs too.

Related: see SOLR-139
I used a Map to represent an "extended value" for a field, and then go through and check for any of those to see if it's an "update" and what type it is.
It has the same downsides you describe above.  The upsides were that everything pretty much just worked w/o modification (update forwarding, javabin serialization, etc).
                
> Add block support for XMLLoader
> -------------------------------
>
>                 Key: SOLR-3535
>                 URL: https://issues.apache.org/jira/browse/SOLR-3535
>             Project: Solr
>          Issue Type: Sub-task
>          Components: update
>    Affects Versions: 4.1, 5.0
>            Reporter: Mikhail Khludnev
>            Priority: Minor
>         Attachments: SOLR-3535.patch
>
>
> I'd like to add the following update xml message:
> <add-block>
>     <doc>....</doc>
>     <doc>....</doc>
> </add-block>
> out of scope for now: 
> * other update formats
> * update log support (NRT), should not be a big deal
> * overwrite feature support for block updates - it's more complicated, I'll tell you why
> Alt
> * wdyt about adding attribute to the current tag {pre}<add block="true">{pre} 
> * or we can establish RunBlockUpdateProcessor which treat every <add> ....</add> as a block.
> *Test is included!!*
> How you'd suggest to improve the patch?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org